How can computer scientist study computational biology?

한 오리가 나는 법을 배우고 싶어했다.

“오리는 날지 않아.”

– 그렇지만 배우고 싶어요

“그렇다면 가르쳐주지. 하지만 힘들거야.”

그래서 오리는 정말 열심히 날기를 연습했다. 그리고 마침내! 그 오리는 날 수 있게 되었다.

“어디한번 날아보게”

오리는 멋지게 하늘을 날았다.

– 고마워요

“다 네가 노력한 덕분이지.”

– 이제 집으로 돌아가 자랑할 수 있게 되었어요.

그리고 오리는 걸어서 집으로 돌아갔다.

결국 배우고 쓰지 않으면 배워 무엇하겠어요….. 요즘 영어 공부에 흥미를 느끼고 있어 (정확하게 보면 느껴야만 하는 상황이긴 하지만 -_-) 쓰다보면 는다길레 영어로 써놨는데, 잘못된 거 있으면 가르쳐주세요 ^^;;


When I first visited Prof. Park to ask a chance of studying database systems as a member of the laboratory of Prof.Park, he suggested me to study computational biology, explaing potential of the field. However, I was afraid that I do not know biology well.

He suggested me a study of algorithms related to string manipulation as a starting point, and I just started reading ALGORITHMS ON STRINGS, TREES, AND SEQUENCES by DAN GUSFIELD. During reading the book, I found some important messages of the author of the book which explain how a computer scientist can dive into this field without little understanding about computational biology.

Our group followed the standard assumption that biologically meaningful results could come from considering DNA as a one-dimensional character string, abstracting away the reality of DNA as a flexible three-dimensional molecule, interacting in a dynamic environment with protein and RNA, and repeating a life-cycle in which even the classic linear chromosome exists for only a fraction of the time. A similar, but stronger assumption existed for protein, holding, for example, that all the information needed for correct three-dimensional folding is contained in the protein itself, essentially independent of the biological environment the protein lives in. This assumption has recently been modified (and remain) a god send, allowing rapid entry into an exciting and important field. Reinforcing the importance of sequence-level investigation were statements such as :

The digital information that underlies biochemistry, cell biology, and development can be represented by a simple string of G’s, A’s, T’s and C’s. This string is the root data structure of an organism’s biology.


In a very real sense, molecular biology is all about sequences. First, it tries to reduece complex biochemical phenomena to interactions between defined sequences…


The ultimate rationale behind all purposeful structures and behavior of living things is embodied in the sequence of residues of nascent polypeptide chains… In a real sense it is at this level of organization that the secret of life (if there is one) is to be found.

So without worrying much about the more difficult chemical and biological aspect of DNA and protein…

This excerpt from the book gave me a firm belief that studying this book will be instrumental in building firm foundation for the study of computational biology. Of course, string manipulation algorithms will be helpful in studying database systems. Today, whole data is text because of the advent of XML.

I’ve covered basic algorithms such as Z, KMP, and Boyer-Moore. Since I have already learned KMP and Boyer-Moore algorithm in the Algorithm Design class last semester, it was not hard for me to understand until now. (You will agree with me that Algorithm Design class of Prof. Yang was extremely difficult and painful because of the extremely fast progree of classwork, difficult homeworks, and panic caused by final exam: we had to read two books in a few days.)

I am in a crisis, however. I had to understand Boyer-Moore throughly to read chapter three. The Prof. Yang’s lecture was only a cursory observation of Boyer-Moore algorithm as he stated.

I am happy now, however. I really have wanted to study ‘real’ computer science and I truly hate the study of ‘techniques’ by which lots of people are excited. Techniques are only techniques. There is no happiness in mere applying of simple tools. So I am extremely happy because I can delve into the theory of computer science.

Similar Posts:

Comments 2