A locally adaptive data compression scheme
Communications of the ACM
Design and analysis of dynamic Huffman codes
Journal of the ACM (JACM)
Data compression using dynamic Markov modelling
The Computer Journal
Analysis of arithmetic coding for data compression
Information Processing and Management: an International Journal - Special issue on data compression for images and texts
Arithmetic coding for data compression
Communications of the ACM
Compression of Low Entropy Strings with Lempel--Ziv Algorithms
SIAM Journal on Computing
An experimental study of an opportunistic index
SODA '01 Proceedings of the twelfth annual ACM-SIAM symposium on Discrete algorithms
A Fast Block-Sorting Algorithm for Lossless Data Compression
DCC '97 Proceedings of the Conference on Data Compression
Universal Lossless Source Coding with the Burrows Wheeler Transform
DCC '99 Proceedings of the Conference on Data Compression
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
DCC '95 Proceedings of the Conference on Data Compression
The Context Trees of Block Sorting Compression
DCC '98 Proceedings of the Conference on Data Compression
On Optimality of Varients of the Block Sorting Compression
DCC '98 Proceedings of the Conference on Data Compression
Engineering a Lightweight Suffix Array Construction Algorithm
ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
Burrows--Wheeler transform and Sturmian words
Information Processing Letters
When indexing equals compression: experiments with compressing suffix arrays and applications
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Compression boosting in optimal linear time using the Burrows-Wheeler Transform
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Journal of the ACM (JACM)
Boosting textual compression in optimal linear time
Journal of the ACM (JACM)
Structuring labeled trees for optimal succinctness, and beyond
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Large alphabets and incompressibility
Information Processing Letters
Succinct suffix arrays based on run-length encoding
Nordic Journal of Computing
A note on the Burrows-Wheeler transformation
Theoretical Computer Science
ACM Computing Surveys (CSUR)
Note: A simple storage scheme for strings achieving entropy bounds
Theoretical Computer Science
Information Processing Letters
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
The engineering of a compression boosting library: theory vs practice in BWT compression
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
A simple storage scheme for strings achieving entropy bounds
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient Algorithms for the Inverse Sort Transform
IEEE Transactions on Computers
A simpler analysis of Burrows–Wheeler-based compression
Theoretical Computer Science
Theoretical Computer Science
Fast BWT in small space by blockwise suffix sorting
Theoretical Computer Science
Theoretical Computer Science
Compressing table data with column dependency
Theoretical Computer Science
Rank and select revisited and extended
Theoretical Computer Science
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
A compressed self-index using a Ziv---Lempel dictionary
Information Retrieval
An(other) Entropy-Bounded Compressed Suffix Tree
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Dynamic Fully-Compressed Suffix Trees
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
An Improved Succinct Representation for Dynamic k-ary Trees
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
Self-indexing Natural Language
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
The myriad virtues of Wavelet Trees
Information and Computation
Storage and Retrieval of Individual Genomes
RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
Quasi-distinct Parsing and Optimal Compression Methods
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
On the Value of Multiple Read/Write Streams for Data Compression
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Engineering a compressed suffix tree implementation
Journal of Experimental Algorithmics (JEA)
Dynamic rank/select structures with applications to run-length encoded texts
Theoretical Computer Science
Rank/select on dynamic compressed sequences and applications
Theoretical Computer Science
Compressing and indexing labeled trees, with applications
Journal of the ACM (JACM)
On prediction using variable order Markov models
Journal of Artificial Intelligence Research
Faster entropy-bounded compressed suffix trees
Theoretical Computer Science
Information Processing Letters
Optimal partitions of strings: a new class of Burrows-Wheeler compression algorithms
CPM'03 Proceedings of the 14th annual conference on Combinatorial pattern matching
Efficient construction of FM-index using overlapping block processing for large scale texts
ECIR'07 Proceedings of the 29th European conference on IR research
Implicit compression boosting with applications to self-indexing
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Approximate string matching with Lempel-Ziv compressed indexes
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Improved dynamic rank-select entropy-bound structures
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Move-to-Front, Distance Coding, and Inversion Frequencies revisited
Theoretical Computer Science
The compressed permuterm index
ACM Transactions on Algorithms (TALG)
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Parallel and distributed compressed indexes
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Top-k ranked document search in general text databases
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices
Journal of Experimental Algorithmics (JEA)
Spatio-temporal range searching over compressed kinetic sensor data
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Medium-space algorithms for inverse BWT
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Compressed self-indices supporting conjunctive queries on document collections
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Colored range queries and document retrieval
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Space-efficient construction of Lempel-Ziv compressed text indexes
Information and Computation
Balancing and clustering of words in the Burrows-Wheeler transform
Theoretical Computer Science
Space-efficient substring occurrence estimation
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ACM Transactions on Algorithms (TALG)
A metadata encoding for memory-constrained devices
Proceedings of the 49th Annual Southeast Regional Conference
Two combinatorial criteria for BWT images
CSR'11 Proceedings of the 6th international conference on Computer science: theory and applications
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Alphabet-independent compressed text indexing
ESA'11 Proceedings of the 19th European conference on Algorithms
Distribution-aware compressed full-text indexes
ESA'11 Proceedings of the 19th European conference on Algorithms
Fixed block compression boosting in FM-indexes
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Improved compressed indexes for full-text document retrieval
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
SIAM Journal on Computing
Word-based self-indexes for natural language text
ACM Transactions on Information Systems (TOIS)
A simpler analysis of burrows-wheeler based compression
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Statistical encoding of succinct data structures
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Reducing the space requirement of LZ-Index
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Space efficient algorithms for the burrows-wheeler backtransformation
ESA'05 Proceedings of the 13th annual European conference on Algorithms
Journal of Discrete Algorithms
Inverted files versus suffix arrays for locating patterns in primary memory
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
A compressed self-index using a ziv-lempel dictionary
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Quasi-distinct parsing and optimal compression methods
Theoretical Computer Science
Succinct suffix arrays based on run-length encoding
CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
The myriad virtues of wavelet trees
ICALP'06 Proceedings of the 33rd international conference on Automata, Languages and Programming - Volume Part I
Monge properties of sequence alignment
Theoretical Computer Science
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Position-Restricted substring searching
LATIN'06 Proceedings of the 7th Latin American conference on Theoretical Informatics
Faster approximate pattern matching in compressed repetitive texts
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Revisiting bounded context block-sorting transformations
Software—Practice & Experience
Efficient in-memory top-k document retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
To index or not to index: time-space trade-offs in search engines with positional ranking functions
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
CRAM: compressed random access memory
ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Move-to-front, distance coding, and inversion frequencies revisited
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
A Lempel-Ziv text index on secondary storage
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Dynamic rank-select structures with applications to run-length encoded texts
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Most burrows-wheeler based compressors are not optimal
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Compressed text indexes with fast locate
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Applying error correction codes to achieve security and dependability
Computer Standards & Interfaces
New lower and upper bounds for representing sequences
ESA'12 Proceedings of the 20th Annual European conference on Algorithms
Compressed suffix trees for repetitive texts
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Improved compressed indexes for full-text document retrieval
Journal of Discrete Algorithms
On compressing and indexing repetitive sequences
Theoretical Computer Science
Colored range queries and document retrieval
Theoretical Computer Science
Trends in suffix sorting: a survey of low memory algorithms
ACSC '12 Proceedings of the Thirty-fifth Australasian Computer Science Conference - Volume 122
Dynamic compressed strings with random access
ICALP'13 Proceedings of the 40th international conference on Automata, Languages, and Programming - Volume Part I
Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences
ACM Computing Surveys (CSUR)
On the value of multiple read/write streams for data compression
Information Theory, Combinatorics, and Search Theory
Journal of Discrete Algorithms
Hi-index | 0.06 |
The Burrows—Wheeler Transform (also known as Block-Sorting) is at the base of compression algorithms that are the state of the art in lossless data compression. In this paper, we analyze two algorithms that use this technique. The first one is the original algorithm described by Burrows and Wheeler, which, despite its simplicity outperforms the Gzip compressor. The second one uses an additional run-length encoding step to improve compression. We prove that the compression ratio of both algorithms can be bounded in terms of the kth order empirical entropy of the input string for any k ≥ 0. We make no assumptions on the input and we obtain bounds which hold in the worst case that is for every possible input string. All previous results for Block-Sorting algorithms were concerned with the average compression ratio and have been established assuming that the input comes from a finite-order Markov source.