Statistical Models for Term Compression

Authors:
James Cheney
Affiliations:
-
Venue:
DCC '00 Proceedings of the Conference on Data Compression
Year:
2000

Citing 0
Cited 2

Structuring labeled trees for optimal succinctness, and beyond

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Functional pearl: every bit counts

Proceedings of the 15th ACM SIGPLAN international conference on Functional programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

Symbolic tree data structures, or terms, are used in many computing systems. Although terms can be compressed by hand, using specialized algorithms, or using universal compression utilities, all of these approaches have drawbacks. We propose an approach, which avoids these problems by using knowledge of term structure to obtain more accurate predictive models for term compression. We describe two models that predict child symbols based on their parents and locations. Our experiments compared these models with first-order Markov sequence models using Huffman coding and found that one model can obtain 20% better compression in similar time, and the other, simpler model can obtain similar compression 40% faster. These compression models also approach, but do not exceed, the performance of gzip.