A locally adaptive data compression scheme
Communications of the ACM
Robust transmission of unbounded strings using Fibonacci representations
IEEE Transactions on Information Theory
Text compression
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
An introduction to Kolmogorov complexity and its applications
An introduction to Kolmogorov complexity and its applications
A new challenge for compression algorithms: genetic sequences
Information Processing and Management: an International Journal - Special issue: data compression
A Generalization of the Suffix Tree to Square Matrices, with Applications
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Data compression via textual substitution
Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Estimating DNA sequence entropy
SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Experiments in text file compression
Communications of the ACM
A Guaranteed Compression Scheme for Repetitive DNA Sequences
DCC '96 Proceedings of the Conference on Data Compression
Significantly Lower Entropy Estimates for Natural DNA Sequences
DCC '97 Proceedings of the Conference on Data Compression
Compression of Biological Sequences by Greedy Off-Line Textual Substitution
DCC '00 Proceedings of the Conference on Data Compression
Pattern Matching in BWT-Transformed Text
DCC '02 Proceedings of the Data Compression Conference
Searching BWT Compressed Text with the Boyer-Moore Algorithm and Binary Search
DCC '02 Proceedings of the Data Compression Conference
The SCP and Compressed Domain Analysis of Biological Sequences
CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
Locating All Tandem Repeat Families in a Sequence
CSB '04 Proceedings of the 2004 IEEE Computational Systems Bioinformatics Conference
An efficient normalized maximum likelihood algorithm for DNA sequence compression
ACM Transactions on Information Systems (TOIS)
Efficient Algorithms for the Inverse Sort Transform
IEEE Transactions on Computers
Computing the inverse sort transform in linear time
ACM Transactions on Algorithms (TALG)
Variations of the parameterized longest previous factor
Journal of Discrete Algorithms
Hi-index | 0.00 |
We investigate off-line dictionary oriented approaches to DNA sequence compression, based on the Burrows-Wheeler Transform (BWT). The preponderance of short repeating patterns is an important phenomenon in biological sequences. Here, we propose off-line methods to compress DNA sequences that exploit the different repetition structures inherent in such sequences. Repetition analysis is performed based on the relationship between the BWT andimportant pattern matching data structures, such as the suffix tree and suffix array. We discuss how the proposed approach can be incorporated in the BWT compression pipeline.