Statistical Models for Term Compression

  • Authors:
  • James Cheney

  • Affiliations:
  • -

  • Venue:
  • DCC '00 Proceedings of the Conference on Data Compression
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Symbolic tree data structures, or terms, are used in many computing systems. Although terms can be compressed by hand, using specialized algorithms, or using universal compression utilities, all of these approaches have drawbacks. We propose an approach, which avoids these problems by using knowledge of term structure to obtain more accurate predictive models for term compression. We describe two models that predict child symbols based on their parents and locations. Our experiments compared these models with first-order Markov sequence models using Huffman coding and found that one model can obtain 20% better compression in similar time, and the other, simpler model can obtain similar compression 40% faster. These compression models also approach, but do not exceed, the performance of gzip.