An Optimal DNA Segmentation Based on the MDL Principle
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
An efficient normalized maximum likelihood algorithm for DNA sequence compression
ACM Transactions on Information Systems (TOIS)
Compression of Annotated Nucleotide Sequences
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
MicroRNA target detection and analysis for genes related to breast cancer using MDLcompress
EURASIP Journal on Bioinformatics and Systems Biology
An optimal DNA segmentation based on the MDL principle
International Journal of Bioinformatics Research and Applications
Searching a pattern in compressed DNA sequences
International Journal of Bioinformatics Research and Applications
Hi-index | 0.01 |
We discuss how to use the normalized maximum likelihood (NML) model for encodingsequences known to have regularities in the form of approximate repetitions. We present aparticular version of the NML model for discrete regression, which is shown to provide avery powerful yet simple model for encoding the approximate repeats in DNA sequences.Combining the model of repeats with a simple first order Markov model we obtain a fastlossless compression method, which compares favorably with the existing DNA compressionprograms. It is remarkable that a simple model, which recursively updates a small numberof parameters, is able to reach the state of the art compression ratio for DNA sequencesobtained with much more complex models. Being a minimum description length (MDL)model, the NML model may later prove to be useful in studying global and local featuresof DNA or possibly of other biological sequences.