Improved Antidictionary Based Compression

  • Authors:
  • Maxime Crochemore;Gonzalo Navarro

  • Affiliations:
  • -;-

  • Venue:
  • SCCC '02 Proceedings of the XII International Conference of the Chilean Computer Science Society
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The compression of binary texts using antidictionaries is a novel technique based on the fact that some substrings (called "antifactors") never appear in the text. Let sb be an antifactor, where b is its last bit. Every time s appears in the text we know that the next bit is b and hence omit its representation. Since building the set of all antifactors is space consuming at compression time, it is customary to limit the maximum length of antifactors considered up to a constant k. Larger k yields better compression of the text but requires more space at compression time.In this paper we introduce the notion of almost antifactors, which are strings that rarely appear in the text. More formally, almost antifactors are strings that, if we consider them as antifactors and separately code their occurrences as exceptions, the compression ratio improves. We show that almost antifactors permit improving compression with a limited amount of main memory to compress. Our experiments show that they obtain the same compression ofthe classical algorithm using only 30% - 55% of its memory space.