An analysis of the longest match and the greedy heuristics in text encoding

Authors:
Jyrki Katajainen;Timo Raita
Affiliations:
Univ. of Turku, Turku, Finland;Univ. of Turku, Turku, Finland
Venue:
Journal of the ACM (JACM)
Year:
1992

Citing 6
Cited 6

Parallel algorithms for data compression

Journal of the ACM (JACM)
Data compression: methods and theory

Data compression: methods and theory
An approximation algorithm for space-optimal encoding of a text

The Computer Journal
Text compression

Text compression
Experiments in text file compression

Communications of the ACM
Algorithm 444: an algorithm for extracting phrases in a space-optimal fashion

Communications of the ACM

Efficient recompression techniques for dynamic full-text retrieval systems

SIGIR '95 Proceedings of the 18th annual international ACM SIGIR conference on Research and development in information retrieval
Worst-case analysis of the Iterated Longest Fragment algorithm

Information Processing Letters
Prediction by Grammatical Match

DCC '00 Proceedings of the Conference on Data Compression
On the bit-complexity of Lempel-Ziv compression

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Dictionary-symbolwise flexible parsing

IWOCA'10 Proceedings of the 21st international conference on Combinatorial algorithms
Dictionary-symbolwise flexible parsing

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.01

Visualization

Abstract

Text compression is often done using a fixed, previously formed dictionary (code book) that expresses which substrings of the text can be replaced by code words. There always exists an optimal solution for text-encoding problem. Due to the long processing times of the various optimal algorithms, several heuristics have been proposed in the literature. In this paper, the worst-case compression gains obtained by the longest match and the greedy heuristics for various types of dictionaries is studied. For general dictionaries, the performance of the heuristics can be almost the weakest possible. In practice, however, the dictionaries have usually properties that lead to a space-optimal or near-space-optimal coding result with the heuristics.