On the bit-complexity of Lempel-Ziv compression

Authors:
Paolo Ferragina;Igor Nitto;Rossano Venturini
Affiliations:
University of Pisa, Italy;University of Pisa, Italy;University of Pisa, Italy
Venue:
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Year:
2009

Citing 25
Cited 7

Parallel algorithms for data compression

Journal of the ACM (JACM)
Data compression with finite windows

Communications of the ACM
An approximation algorithm for space-optimal encoding of a text

The Computer Journal
An analysis of the longest match and the greedy heuristics in text encoding

Journal of the ACM (JACM)
Optimal prefetching via data compression

Journal of the ACM (JACM)
Greedy algorithms for on-line data compression

Journal of Algorithms
Optimal bounds for the predecessor problem

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
On the optimality of parsing in dynamic dictionary based data compression

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Compression of Low Entropy Strings with Lempel--Ziv Algorithms

SIAM Journal on Computing
Examining Computational Geometry, Van Emde Boas Trees, and Hashing from the Perspective of the Fusion Tree

SIAM Journal on Computing
Online timestamped text indexing

Information Processing Letters
Managing Gigabytes: Compressing and Indexing Documents and Images

Managing Gigabytes: Compressing and Indexing Documents and Images
Introduction to Algorithms

Introduction to Algorithms
Parsing with suffix and prefix dictionaries

DCC '96 Proceedings of the Conference on Data Compression
UbiCrawler: a scalable fully distributed web crawler

Software—Practice & Experience
Substring compression problems

SODA '05 Proceedings of the sixteenth annual ACM-SIAM symposium on Discrete algorithms
Boosting textual compression in optimal linear time

Journal of the ACM (JACM)
Data Compression: The Complete Reference

Data Compression: The Complete Reference
Succinct ordinal trees with level-ancestor queries

ACM Transactions on Algorithms (TALG)
Compressed full-text indexes

ACM Computing Surveys (CSUR)
The engineering of a compression boosting library: theory vs practice in BWT compression

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A Simple Algorithm for Computing the Lempel Ziv Factorization

DCC '08 Proceedings of the Data Compression Conference
Clustering by compression

IEEE Transactions on Information Theory
Classification With Finite Memory Revisited

IEEE Transactions on Information Theory
Space-conscious compression

MFCS'07 Proceedings of the 32nd international conference on Mathematical Foundations of Computer Science

On compressing the textual web

Proceedings of the third ACM international conference on Web search and data mining
Data structures: time, I/Os, entropy, joules!

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Dictionary-symbolwise flexible parsing

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Dictionary-symbolwise flexible parsing

Journal of Discrete Algorithms
Near real-time suffix tree construction via the fringe marked ancestor problem

Journal of Discrete Algorithms
Optimized relative Lempel-Ziv compression of genomes

ACSC '11 Proceedings of the Thirty-Fourth Australasian Computer Science Conference - Volume 113
On parsing optimality for dictionary-based text compression-the Zip case

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

One of the most famous and investigated lossless data-compression schemes is the one introduced by Lempel and Ziv about 30 years ago [37]. This compression scheme is known as "dictionary-based compressor" and consists of squeezing an input string by replacing some of its substrings with (shorter) codewords which are actually pointers to a dictionary of phrases built as the string is processed. Surprisingly enough, although many fundamental results are nowadays known about the speed and effectiveness of this compression process (see e.g. [23, 28] and references therein), "we are not aware of any parsing scheme that achieves optimality when the LZ77-dictionary is in use under any constraint on the codewords other than being of equal length" [28, pag. 159]. Here optimality means to achieve the minimum number of bits in compressing each individual input string, without any assumption on its generating source. In this paper we investigate three issues pertaining to the bit-complexity of LZ-based compressors, and we design algorithms which achieve bit-optimality in the compressed output size by taking efficient/optimal time and optimal space. These theoretical results will be sustained by some experiments that will compare our novel LZ-based compressors against the most popular compression tools (like gzip, bzip2) and state-of-the-art compressors (like the booster of [14, 13]).