An analysis of the longest match and the greedy heuristics in text encoding

  • Authors:
  • Jyrki Katajainen;Timo Raita

  • Affiliations:
  • Univ. of Turku, Turku, Finland;Univ. of Turku, Turku, Finland

  • Venue:
  • Journal of the ACM (JACM)
  • Year:
  • 1992

Quantified Score

Hi-index 0.01

Visualization

Abstract

Text compression is often done using a fixed, previously formed dictionary (code book) that expresses which substrings of the text can be replaced by code words. There always exists an optimal solution for text-encoding problem. Due to the long processing times of the various optimal algorithms, several heuristics have been proposed in the literature. In this paper, the worst-case compression gains obtained by the longest match and the greedy heuristics for various types of dictionaries is studied. For general dictionaries, the performance of the heuristics can be almost the weakest possible. In practice, however, the dictionaries have usually properties that lead to a space-optimal or near-space-optimal coding result with the heuristics.