The relationship between greedy parsing and symbolwise text compression

Authors:
Timothy C. Bell;Ian H. Witten
Affiliations:
Univ. of Canterbury, Christchurch, New Zealand;Univ. of Waikato, Hamilton, New Zealand
Venue:
Journal of the ACM (JACM)
Year:
1994

Citing 12
Cited 4

Parallel algorithms for data compression

Journal of the ACM (JACM)
Compression of character strings by an adaptive dictionary

BIT
A locally adaptive data compression scheme

Communications of the ACM
Data compression using dynamic Markov modelling

The Computer Journal
A note on the DMC data compression scheme

The Computer Journal
Data compression with finite windows

Communications of the ACM
Text compression

Text compression
Arithmetic coding for data compression

Communications of the ACM
Linear Algorithm for Data Compression via String Matching

Journal of the ACM (JACM)
Data compression via textual substitution

Journal of the ACM (JACM)
Experiments in text file compression

Communications of the ACM
Common phrases and minimum-space text storage

Communications of the ACM

The effect of non-greedy parsing in Ziv-Lempel compression methods

DCC '95 Proceedings of the Conference on Data Compression
Multiple-dictionary compression using partial matching

DCC '95 Proceedings of the Conference on Data Compression
Dictionary-symbolwise flexible parsing

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Dictionary-symbolwise flexible parsing

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Text compression methods can be divided into two classes: symbolwise and parsing. Symbolwise methods assign codes to individual symbols, while parsing methods assign codes to groups of consecutive symbols (phrases). The set of phrases available to a parsing method is referred to as a dictionary. The vast majority of parsing methods in the literature use greedy parsing (including nearly all variations of the popular Ziv-Lempel methods). When greedy parsing is used, the coder processes a string from left to right, at each step encoding as many symbols as possible with a phrase from the dictionary. This parsing strategy is not optimal, but an optimal method cannot guarantee a bounded coding delay.An important problem in compression research has been to establish the relationship between symbolwise methods and parsing methods. This paper extends prior work that shows that there are symbolwise methods that simulate a subset of greedy parsing methods. We provide a more general algorithm that takes any nonadaptive greedy parsing method and constructs a symbolwise method that achieves exactly the same compression. Combined with the existence of symbolwise equivalents for two of the most significant adaptive parsing methods, this result gives added weight to the idea that research aimed at increasing compression should concentrate on symbolwise methods, while parsing methods should be chosen for speed or temporary storage considerations.