Text compression
On the Length of Programs for Computing Finite Binary Sequences
Journal of the ACM (JACM)
A Guaranteed Compression Scheme for Repetitive DNA Sequences
DCC '96 Proceedings of the Conference on Data Compression
On Complexity Measures for Biological Sequences
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
An efficient normalized maximum likelihood algorithm for DNA sequence compression
ACM Transactions on Information Systems (TOIS)
Data Compression: The Complete Reference
Data Compression: The Complete Reference
Introduction to Data Compression, Third Edition (Morgan Kaufmann Series in Multimedia Information and Systems)
Normalized maximum likelihood model of order-1 for the compression of DNA sequences
DCC '07 Proceedings of the 2007 Data Compression Conference
A Simple Statistical Algorithm for Biological Sequence Compression
DCC '07 Proceedings of the 2007 Data Compression Conference
DNA coding using finite-context models and arithmetic coding
ICASSP '09 Proceedings of the 2009 IEEE International Conference on Acoustics, Speech and Signal Processing
Generalized kraft inequality and arithmetic coding
IBM Journal of Research and Development
DNA compression challenge revisited: a dynamic programming approach
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
IEEE Transactions on Information Theory
Hi-index | 0.00 |
Every data compression method assumes a certain model of the information source that produces the data. When we improve a data compression method, we are also improving the model of the source. This happens because, when the probability distribution of the assumed source model is closer to the true probability distribution of the source, a smaller relative entropy results and, therefore, fewer redundancy bits are required. This is why the importance of data compression goes beyond the usual goal of reducing the storage space or the transmission time of the information. In fact, in some situations, seeking better models is the main aim. In our view, this is the case for DNA sequence data. In this paper, we give hints on how finite-context (Markov) modeling may be used for DNA sequence analysis, through the construction of complexity profiles of the sequences. These profiles are able to unveil structures of the DNA, some of them with potential biological relevance.