Is text compression by prefixes and suffixes practical?

  • Authors:
  • A. S. Fraenkel;M. Mor;Y. Perl

  • Affiliations:
  • The Weizmann Institute of Science, Rehovot, Israel, Partial affiliation with IRCOL;The Weizmann Institute of Science, Rehovot, Israel;Bar-Ilan University, Ramat Gan, Israel, Partial affiliation with IRCOL

  • Venue:
  • SIGIR '82 Proceedings of the 5th annual ACM conference on Research and development in information retrieval
  • Year:
  • 1982

Quantified Score

Hi-index 0.00

Visualization

Abstract

One approach to text compression is to replace high-frequency variable-length fragments of words by fixed-length codes pointing to a compression table containing these high-frequency fragments. It is shown that the problem of optimal fragment compression is NP-hard even if the fragments are restricted to prefixes and suffixes. This seems to be a simplest fragment compression problem which is NP-hard, since a polynomial algorithm for compressing by prefixes only (or suffixes only) has been found recently. Various compression heuristics based on using both prefixes and suffixes have been tested on large Hebrew and English texts. The best of these heuristics produce a net compression of some 37% for Hebrew and 45% for English using a prefix/suffix compression table of size 256.