Arithmetic coding for data compression
Communications of the ACM
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications
CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
DCC '02 Proceedings of the Data Compression Conference
Fast Compression with a Static Model in High-Order Entropy
DCC '04 Proceedings of the Conference on Data Compression
When indexing equals compression: experiments with compressing suffix arrays and applications
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
A Fast and Efficient Post BWT-Stage for the Burrows-Wheeler Compression Algorithm
DCC '05 Proceedings of the Data Compression Conference
Boosting textual compression in optimal linear time
Journal of the ACM (JACM)
Context exhumation after the Burrows--Wheeler transform
Information Processing Letters
Fast lightweight suffix array construction and checking
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
A simpler analysis of burrows-wheeler based compression
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
The myriad virtues of wavelet trees
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
Burrows–Wheeler compression: Principles and reflections
Theoretical Computer Science
A simpler analysis of Burrows–Wheeler-based compression
Theoretical Computer Science
On the bit-complexity of Lempel-Ziv compression
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The myriad virtues of Wavelet Trees
Information and Computation
Compressing and indexing labeled trees, with applications
Journal of the ACM (JACM)
On compressing the textual web
Proceedings of the third ACM international conference on Web search and data mining
Move-to-Front, Distance Coding, and Inversion Frequencies revisited
Theoretical Computer Science
Post BWT stages of the Burrows–Wheeler compression algorithm
Software—Practice & Experience
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Balancing and clustering of words in the Burrows-Wheeler transform
Theoretical Computer Science
Lightweight data indexing and compression in external memory
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Revisiting bounded context block-sorting transformations
Software—Practice & Experience
Move-to-front, distance coding, and inversion frequencies revisited
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Most burrows-wheeler based compressors are not optimal
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
Data Compression is one of the most challenging arenas both for algorithm design and engineering. This is particularly true for Burrows and Wheeler Compression a technique that is important in itself and for the design of compressed indexes. There has been considerable debate on how to design and engineer compression algorithms based on the BWT paradigm. In particular, Move-to-Front Encoding is generally believed to be an "inefficient" part of the Burrows-Wheeler compression process. However, only recently two theoretically superior alternatives to Move-to-Front have been proposed, namely Compression Boosting and Wavelet Trees. The main contribution of this paper is to provide the first experimental comparison of these three techniques, giving a much needed methodological contribution to the current debate. We do so by providing a carefully engineered compression boosting library that can be used, on the one hand, to investigate the myriad new compression algorithms that can be based on boosting, and on the other hand, to make the first experimental assessment of how Move-to-Front behaves with respect to its recently proposed competitors. The main conclusion is that Boosting, Wavelet Trees and Move-to-Front yield quite close compression performance. Finally, our extensive experimental study of boosting technique brings to light a new fact overlooked in 10 years of experiments in the area: a fast adapting order-zero compressor is enough to provide state of the art BWT compression by simply compressing the run length encoded transform. In other words, Move-to-Front, Wavelet Trees, and Boosters can all be by-passed by a fast learner.