Antisequential Suffix Sorting for BWT-Based Data Compression
IEEE Transactions on Computers
Boosting textual compression in optimal linear time
Journal of the ACM (JACM)
Superior Guarantees for Sequential Prediction and Lossless Compression via Alphabet Decomposition
The Journal of Machine Learning Research
Universal source controlled channel decoding with nonsystematic quick-look-in turbo codes
IEEE Transactions on Communications
Move-to-Front, Distance Coding, and Inversion Frequencies revisited
Theoretical Computer Science
On the possible patterns of inputs for block sorting in the Burrows-Wheeler transformation
Information Processing Letters
On families of new adaptive compression algorithms suitable for time-varying source data
ADVIS'04 Proceedings of the Third international conference on Advances in Information Systems
A fast and efficient nearly-optimal adaptive Fano coding scheme
Information Sciences: an International Journal
Revisiting bounded context block-sorting transformations
Software—Practice & Experience
Most burrows-wheeler based compressors are not optimal
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 754.84 |
The Burrows Wheeler transform (1994) is a reversible sequence transformation used in a variety of practical lossless source-coding algorithms. In each, the BWT is followed by a lossless source code that attempts to exploit the natural ordering of the BWT coefficients. BWT-based compression schemes are widely touted as low-complexity algorithms giving lossless coding rates better than those of the Ziv-Lempel codes (commonly known as LZ'77 and LZ'78) and almost as good as those achieved by prediction by partial matching (PPM) algorithms. To date, the coding performance claims have been made primarily on the basis of experimental results. This work gives a theoretical evaluation of BWT-based coding. The main results of this theoretical evaluation include: (1) statistical characterizations of the BWT output on both finite strings and sequences of length n → ∞, (2) a variety of very simple new techniques for BWT-based lossless source coding, and (3) proofs of the universality and bounds on the rates of convergence of both new and existing BWT-based codes for finite-memory and stationary ergodic sources. The end result is a theoretical justification and validation of the experimentally derived conclusions: BWT-based lossless source codes achieve universal lossless coding performance that converges to the optimal coding performance more quickly than the rate of convergence observed in Ziv-Lempel style codes and, for some BWT-based codes, within a constant factor of the optimal rate of convergence for finite-memory sources