# 2.3 Modeling the genefinding problem  (Page 2/3)

 Page 2 / 3

## An example: finding cpg islands

This example is taken from the excellent textbook Biological Sequence Analysis: probabilistic models of proteins and nucleic acids by Durbin, Eddy, Krogh and Mitchison. CpG islands are regions of the genome with a higher than normal percentage of C and G bases adjacent to each other. The usual percentage of adjacent CG bases in the genome is about 1%, but in CpG islands that percentage is over 6%. The reason that C followed by G is relatively rare in The "p" in "CpG" refers to the phosphodiester bond between the cytosine and the guanine, and serves to distinguish it from the C and G pairing on the double stranded DNA helix. CpG islands are bioogically intersting because they are in or near 40% of the promoters in mammalian genes and 70% in human promoter genes. CpG islands vary in length between 300 and 3000 basepairs. Thus fixed-length consensus sequence based approaches do not work well for detecting them. Effective identification of of CpG islands can aid in localizing genes in eukaryotes. CpG island detection also serves as an excellent problem to illustrate the power of Markov models.

We will consider two problems.

• Given a short DNA sequence, does it come from a CpG island or not?
• Given a long DNA sequence, find all the CpG islands on it, if any.

## Generative models of biological sequences

We will construct generative models of CpG islands. A generative model produces strings, and the model parameters are tuned to reflect the characteristics of CpG islands.

The simplest probabilistic generative DNA sequence model associates a probability with the occurrence of each base: P(A), P(C), P(G) and P(T) such that these probabilities all sum to 1. For H. influenzae, these probabilities are P(A) = 0.3, P(C) = 0.2, P(G) = 0.2, and P(T) = 0.3. To generate a sequence based on this model, we first choose the length L of the sequence that we wish to construct. Then we draw bases for each position based on the discrete distribution above, as shown in the code fragement below.

i = 1; while i less-than-or-equal-to L doS[i] = a base drawn from the discrete probability distribution [0.3,0.2,0.2,0.3](for A,C,G,T) i = i+1end

This model does not capture interdependencies between bases. It assumes that the choice of base in each position of the generated sequence is independent of the bases surrounding it. A more complex model of DNA sequences can be constructed using the theory of Markov chains. In Markov chains, the probability of observing a base at a given position in a sequence is conditioned on the bases preceding it. Thus, Markov chains can model local correlations among the nucleotides. A Markov chain of order 1 assumes that the probability of a base at position i is dependent only on the base at position i - 1. A first order Markov chain can be specified by a probability matrix as shown below.

A first order markov model for generating dna sequences
A C G T
A 0.6 0.2 0.1 0.1
C 0.1 0.1 0.8 0.0
G 0.2 0.2 0.3 0.3
T 0.1 0.8 0.0 0.1

#### Questions & Answers

where we get a research paper on Nano chemistry....?
Maira Reply
nanopartical of organic/inorganic / physical chemistry , pdf / thesis / review
Ali
what are the products of Nano chemistry?
Maira Reply
There are lots of products of nano chemistry... Like nano coatings.....carbon fiber.. And lots of others..
learn
Even nanotechnology is pretty much all about chemistry... Its the chemistry on quantum or atomic level
learn
Google
da
no nanotechnology is also a part of physics and maths it requires angle formulas and some pressure regarding concepts
Bhagvanji
hey
Giriraj
Preparation and Applications of Nanomaterial for Drug Delivery
Hafiz Reply
revolt
da
Application of nanotechnology in medicine
what is variations in raman spectra for nanomaterials
Jyoti Reply
ya I also want to know the raman spectra
Bhagvanji
I only see partial conversation and what's the question here!
Crow Reply
what about nanotechnology for water purification
RAW Reply
please someone correct me if I'm wrong but I think one can use nanoparticles, specially silver nanoparticles for water treatment.
Damian
yes that's correct
Professor
I think
Professor
Nasa has use it in the 60's, copper as water purification in the moon travel.
Alexandre
nanocopper obvius
Alexandre
what is the stm
Brian Reply
is there industrial application of fullrenes. What is the method to prepare fullrene on large scale.?
Rafiq
industrial application...? mmm I think on the medical side as drug carrier, but you should go deeper on your research, I may be wrong
Damian
How we are making nano material?
LITNING Reply
what is a peer
LITNING Reply
What is meant by 'nano scale'?
LITNING Reply
What is STMs full form?
LITNING
scanning tunneling microscope
Sahil
how nano science is used for hydrophobicity
Santosh
Do u think that Graphene and Fullrene fiber can be used to make Air Plane body structure the lightest and strongest. Rafiq
Rafiq
what is differents between GO and RGO?
Mahi
what is simplest way to understand the applications of nano robots used to detect the cancer affected cell of human body.? How this robot is carried to required site of body cell.? what will be the carrier material and how can be detected that correct delivery of drug is done Rafiq
Rafiq
if virus is killing to make ARTIFICIAL DNA OF GRAPHENE FOR KILLED THE VIRUS .THIS IS OUR ASSUMPTION
Anam
analytical skills graphene is prepared to kill any type viruses .
Anam
Any one who tell me about Preparation and application of Nanomaterial for drug Delivery
Hafiz
what is Nano technology ?
Bob Reply
write examples of Nano molecule?
Bob
The nanotechnology is as new science, to scale nanometric
brayan
nanotechnology is the study, desing, synthesis, manipulation and application of materials and functional systems through control of matter at nanoscale
Damian
Is there any normative that regulates the use of silver nanoparticles?
Damian Reply
what king of growth are you checking .?
Renato
What fields keep nano created devices from performing or assimulating ? Magnetic fields ? Are do they assimilate ?
Stoney Reply
why we need to study biomolecules, molecular biology in nanotechnology?
Adin Reply
?
Kyle
yes I'm doing my masters in nanotechnology, we are being studying all these domains as well..
Adin
why?
Adin
what school?
Kyle
biomolecules are e building blocks of every organics and inorganic materials.
Joe
Got questions? Join the online conversation and get instant answers!
Jobilize.com Reply

### Read also:

#### Get the best Algebra and trigonometry course in your pocket!

Source:  OpenStax, Statistical machine learning for computational biology. OpenStax CNX. Oct 14, 2007 Download for free at http://cnx.org/content/col10455/1.2
Google Play and the Google Play logo are trademarks of Google Inc.

Notification Switch

Would you like to follow the 'Statistical machine learning for computational biology' conversation and receive update notifications?

 By Jazzycazz Jackson By Jonathan Long By OpenStax By OpenStax By Sarah Warren By Abby Sharp By OpenStax By John Gabrieli By Jams Kalo By Rohini Ajay