Most burrows-wheeler based compressors are not optimal

Authors:
Haim Kaplan;Elad Verbin
Affiliations:
School of Computer Science, Tel Aviv University, Tel Aviv, Israel;School of Computer Science, Tel Aviv University, Tel Aviv, Israel
Venue:
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Year:
2007

Citing 15
Cited 2

A locally adaptive data compression scheme

Communications of the ACM
Elements of information theory

Elements of information theory
Arithmetic coding for data compression

Communications of the ACM
Arithmetic coding revisited

ACM Transactions on Information Systems (TOIS)
Compression of Low Entropy Strings with Lempel--Ziv Algorithms

SIAM Journal on Computing
An analysis of the Burrows—Wheeler transform

Journal of the ACM (JACM)
Second step algorithms in the Burrows-Wheeler compression algorithm

Software—Practice & Experience
Redundancy of the Lempel-Ziv-Welch Code

DCC '97 Proceedings of the Conference on Data Compression
Engineering a Lightweight Suffix Array Construction Algorithm

Algorithmica
Boosting textual compression in optimal linear time

Journal of the ACM (JACM)
The engineering of a compression boosting library: theory vs practice in BWT compression

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A simpler analysis of Burrows–Wheeler-based compression

Theoretical Computer Science
A Technique for High-Performance Data Compression

Computer
Universal lossless source coding with the Burrows Wheeler transform

IEEE Transactions on Information Theory
Variations on a theme by Huffman

IEEE Transactions on Information Theory

Move-to-Front, Distance Coding, and Inversion Frequencies revisited

Theoretical Computer Science
A new compression scheme for secure transmission

International Journal of Automation and Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a technique for proving lower bounds on the compression ratio of algorithms which are based on the Burrows-Wheeler Transform (BWT). We study three well known BWT-based compressors: the original algorithm suggested by Burrows and Wheeler; BWT with distance coding; and BWT with run-length encoding. For each compressor, we show a Markov source such that for asymptotically-large text generated by the source, the compression ratio divided by the entropy of the source is a constant greater than 1. This constant is 2 - ε, 1.26, and 1.29, for each of the three compressors respectively. Our technique is robust, and can be used to prove similar claims for most BWT-based compressors (with a few notable exceptions). This stands in contrast to statistical compressors and Lempel-Ziv-style dictionary compressors, which are long known to be optimal, in the sense that for any Markov source, the compression ratio divided by the entropy of the source asymptotically tends to 1. We experimentally corroborate our theoretical bounds. Furthermore, we compare BWT-based compressors to other compressors and show that for "realistic" Markov sources they indeed perform bad and often worse than other compressors. This is in contrast with the well known fact that on English text, BWT-based compressors are superior to many other types of compressors.