Communications of the ACM
A locally adaptive data compression scheme
Communications of the ACM
Bounds on the redundancy of Huffman codes
IEEE Transactions on Information Theory
Data compression using dynamic Markov modelling
The Computer Journal
The Strength of Weak Learnability
Machine Learning
Elements of information theory
Elements of information theory
Counting permutations with given cycle structure and descent set
Journal of Combinatorial Theory Series A
Extracting randomness: a survey and new constructions
Journal of Computer and System Sciences
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Compression of Low Entropy Strings with Lempel--Ziv Algorithms
SIAM Journal on Computing
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Extractors and pseudorandom generators
Journal of the ACM (JACM)
Image and Text Compression
Second step algorithms in the Burrows-Wheeler compression algorithm
Software—Practice & Experience
A Fast Block-Sorting Algorithm for Lossless Data Compression
DCC '97 Proceedings of the Conference on Data Compression
Modifications of the Burrows and Wheeler Data Compression Algorithm
DCC '99 Proceedings of the Conference on Data Compression
The Context Trees of Block Sorting Compression
DCC '98 Proceedings of the Conference on Data Compression
On Optimality of Varients of the Block Sorting Compression
DCC '98 Proceedings of the Conference on Data Compression
Generalization of the BWT Transformation and Inversion Ranks
DCC '02 Proceedings of the Data Compression Conference
Can We Do without Ranks in Burrows Wheeler Transform Compression?
DCC '01 Proceedings of the Data Compression Conference
Breaking a Time-and-Space Barrier in Constructing Full-Text Indices
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
A note on the Burrows-Wheeler transformation
Theoretical Computer Science
Universal lossless source coding with the Burrows Wheeler transform
IEEE Transactions on Information Theory
Journal of the ACM (JACM)
Structuring labeled trees for optimal succinctness, and beyond
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
When indexing equals compression: Experiments with compressing suffix arrays and applications
ACM Transactions on Algorithms (TALG)
ACM Computing Surveys (CSUR)
Note: A simple storage scheme for strings achieving entropy bounds
Theoretical Computer Science
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
The engineering of a compression boosting library: theory vs practice in BWT compression
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A simple storage scheme for strings achieving entropy bounds
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
A simpler analysis of Burrows–Wheeler-based compression
Theoretical Computer Science
Theoretical Computer Science
Compressing table data with column dependency
Theoretical Computer Science
Rank and select revisited and extended
Theoretical Computer Science
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
On the bit-complexity of Lempel-Ziv compression
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The myriad virtues of Wavelet Trees
Information and Computation
Compressing and indexing labeled trees, with applications
Journal of the ACM (JACM)
A Linear-Time Burrows-Wheeler Transform Using Induced Sorting
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
On compressing the textual web
Proceedings of the third ACM international conference on Web search and data mining
Move-to-Front, Distance Coding, and Inversion Frequencies revisited
Theoretical Computer Science
The compressed permuterm index
ACM Transactions on Algorithms (TALG)
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Balancing and clustering of words in the Burrows-Wheeler transform
Theoretical Computer Science
Succinct indexes for strings, binary relations and multilabeled trees
ACM Transactions on Algorithms (TALG)
Alphabet-independent compressed text indexing
ESA'11 Proceedings of the 19th European conference on Algorithms
ERA: efficient serial and parallel suffix tree construction for very long strings
Proceedings of the VLDB Endowment
Fixed block compression boosting in FM-indexes
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
A simpler analysis of burrows-wheeler based compression
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
The myriad virtues of wavelet trees
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
Revisiting bounded context block-sorting transformations
Software—Practice & Experience
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Move-to-front, distance coding, and inversion frequencies revisited
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Most burrows-wheeler based compressors are not optimal
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Journal of Discrete Algorithms
Hi-index | 0.01 |
We provide a general boosting technique for Textual Data Compression. Qualitatively, it takes a good compression algorithm and turns it into an algorithm with a better compression performance guarantee. It displays the following remarkable properties: (a) it can turn any memoryless compressor into a compression algorithm that uses the “best possible” contexts; (b) it is very simple and optimal in terms of time; and (c) it admits a decompression algorithm again optimal in time. To the best of our knowledge, this is the first boosting technique displaying these properties.Technically, our boosting technique builds upon three main ingredients: the Burrows--Wheeler Transform, the Suffix Tree data structure, and a greedy algorithm to process them. Specifically, we show that there exists a proper partition of the Burrows--Wheeler Transform of a string s that shows a deep combinatorial relation with the kth order entropy of s. That partition can be identified via a greedy processing of the suffix tree of s with the aim of minimizing a proper objective function over its nodes. The final compressed string is then obtained by compressing individually each substring of the partition by means of the base compressor we wish to boost.Our boosting technique is inherently combinatorial because it does not need to assume any prior probabilistic model about the source emitting s, and it does not deploy any training, parameter estimation and learning. Various corollaries are derived from this main achievement. Among the others, we show analytically that using our booster, we get better compression algorithms than some of the best existing ones, that is, LZ77, LZ78, PPMC and the ones derived from the Burrows--Wheeler Transform. Further, we settle analytically some long-standing open problems about the algorithmic structure and the performance of BWT-based compressors. Namely, we provide the first family of BWT algorithms that do not use Move-To-Front or Symbol Ranking as a part of the compression process.