Text Compression for Dynamic Document Databases

  • Authors:
  • Alistair Moffat;Justin Zobel;Neil Sharman

  • Affiliations:
  • -;-;-

  • Venue:
  • IEEE Transactions on Knowledge and Data Engineering
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

For compression of text databases, semi-static word-based methods provide good performance in terms of both speed and disk space, but two problems arise. First, the memory requirements for the compression model during decoding can be unacceptably high. Second, the need to handle document insertions means that the collection must be periodically recompressed if compression efficiency is to be maintained on dynamic collections. Here we show that with careful management the impact of both of these drawbacks can be kept small. Experiments with a word-based model and over 500 Mb of text show that excellent compression rates can be retained even in the presence of severe memory limitations on the decoder, and after significant expansion in the amount of stored text.