On parsing optimality for dictionary-based text compression-the Zip case

Authors:
Alessio Langiu
Affiliations:
-
Venue:
Journal of Discrete Algorithms
Year:
2013

Citing 23
Cited 0

Data compression with finite windows

Communications of the ACM
Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
On the optimality of parsing in dynamic dictionary based data compression

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
Data compression via textual substitution

Journal of the ACM (JACM)
Common phrases and minimum-space text storage

Communications of the ACM
Parsing with suffix and prefix dictionaries

DCC '96 Proceedings of the Conference on Data Compression
The effect of non-greedy parsing in Ziv-Lempel compression methods

DCC '95 Proceedings of the Conference on Data Compression
Replacing suffix trees with enhanced suffix arrays

Journal of Discrete Algorithms - SPIRE 2002
Asymptotical Optimality of Two Variations of Lempel-Ziv Codes for Sources with Countably Infinite Alphabet

IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences
Data Compression: The Complete Reference

Data Compression: The Complete Reference
Algorithms on Strings

Algorithms on Strings
A Technique for High-Performance Data Compression

Computer
Computing Longest Previous Factor in linear time and applications

Information Processing Letters
A Simple Algorithm for Computing the Lempel Ziv Factorization

DCC '08 Proceedings of the Data Compression Conference
On the bit-complexity of Lempel-Ziv compression

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
LPF Computation Revisited

Combinatorial Algorithms
Dictionary-symbolwise flexible parsing

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Quasi-distinct parsing and optimal compression methods

Theoretical Computer Science
Efficient algorithms for three variants of the LPF table

Journal of Discrete Algorithms
Dictionary-symbolwise flexible parsing

Journal of Discrete Algorithms
On the Complexity of Finite Sequences

IEEE Transactions on Information Theory
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory
Compression of individual sequences via variable-rate coding

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dictionary-based compression schemes are the most commonly used data compression schemes since they appeared in the foundational paper of Ziv and Lempel in 1977, and generally referred to as LZ77. Their work is the base of Zip, gZip, 7-Zip and many other compression software utilities. Some of these compression schemes use variants of the greedy approach to parse the text into dictionary phrases; others have left the greedy approach to improve the compression ratio. Recently, two bit-optimal parsing algorithms have been presented filling the gap between theory and best practice. We present a survey on the parsing problem for dictionary-based text compression, identifying noticeable results of both a theoretical and practical nature, which have appeared in the last three decades. We follow the historical steps of the Zip scheme showing how the original optimal parsing problem of finding a parse formed by the minimum number of phrases has been replaced by the bit-optimal parsing problem where the goal is to minimise the length in bits of the encoded text.