An Optimal DNA Segmentation Based on the MDL Principle

  • Authors:
  • Wojciech Szpankowski;Wenhui Ren;Lukasz Szpankowski

  • Affiliations:
  • -;-;-

  • Venue:
  • CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

The biological world is highly stochastic as well as inhomogeneousin its behavior. The transition between homogeneousand inhomogeneous regions of DNA, known also aschange points, carry important biological information. Ourgoal is to employ rigorous methods of information theoryto quantify structural properties of DNA sequences. In particular,we adopt the Stein-Ziv lemma to find asymptoticallyoptimal discriminant function that determines whether twoDNA segments are generated by the same source and assuringexponentially small false positives. Then we applythe Minimum Description Length (MDL) principle to selectparameters of our segmentation algorithm. Finally, we performextensive experimental work on human chromosome 9.After grouping A and G (purines) and T and C (pyrimidines)we discover change points between coding and noncodingregions as well as the beginning of a CpG island.