Elements of information theory
Elements of information theory
DNA segmentation as a model selection process
RECOMB '01 Proceedings of the fifth annual international conference on Computational biology
Bioinformatics: the machine learning approach
Bioinformatics: the machine learning approach
Average Case Analysis of Algorithms on Sequences
Average Case Analysis of Algorithms on Sequences
Statistical Identification of Uniformly Mutated Segments within Repeats
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
DCC '99 Proceedings of the Conference on Data Compression
DNA sequence compression using the normalized maximum likelihood model for discrete regression
DCC '03 Proceedings of the Conference on Data Compression
IEEE Transactions on Information Theory
Optimal segmentation using tree models
Knowledge and Information Systems
International Journal of Data Mining and Bioinformatics
Hi-index | 0.00 |
The biological world is highly stochastic as well as inhomogeneousin its behavior. The transition between homogeneousand inhomogeneous regions of DNA, known also aschange points, carry important biological information. Ourgoal is to employ rigorous methods of information theoryto quantify structural properties of DNA sequences. In particular,we adopt the Stein-Ziv lemma to find asymptoticallyoptimal discriminant function that determines whether twoDNA segments are generated by the same source and assuringexponentially small false positives. Then we applythe Minimum Description Length (MDL) principle to selectparameters of our segmentation algorithm. Finally, we performextensive experimental work on human chromosome 9.After grouping A and G (purines) and T and C (pyrimidines)we discover change points between coding and noncodingregions as well as the beginning of a CpG island.