A Simple Statistical Algorithm for Biological Sequence Compression
DCC '07 Proceedings of the 2007 Data Compression Conference
Source Coding Scheme for Multiple Sequence Alignments
DCC '09 Proceedings of the 2009 Data Compression Conference
The context-tree weighting method: basic properties
IEEE Transactions on Information Theory
Compression of whole genome alignments using a mixture of finite-context models
ICIAR'12 Proceedings of the 9th international conference on Image Analysis and Recognition - Volume Part I
Hi-index | 0.00 |
Recent advances in DNA sequencing technology have caused an exponential growth of publicly available genomic sequence data. A particularly voluminous, frequently used static data set are whole genome alignments. The first lossless compression algorithm for such data sets based on well-established statistical evolutionary models and prediction techniques from lossless binary image compression is introduced. The compression rate is improved by a factor of 1.6 compared to the currently used Lempel-Ziv (LZ) compression.