Lossless Compression Based on the Sequence Memoizer

  • Authors:
  • Jan Gasthaus;Frank Wood;Yee Whye Teh

  • Affiliations:
  • -;-;-

  • Venue:
  • DCC '10 Proceedings of the 2010 Data Compression Conference
  • Year:
  • 2010

Quantified Score

Hi-index 0.02

Visualization

Abstract

In this work we describe a sequence compression method based on combining a Bayesian nonparametric sequence model with entropy encoding. The model, a hierarchy of Pitman-Yor processes of unbounded depth previously proposed by Wood et al. [16] in the context of language modelling, allows modelling of long-range dependencies by allowing conditioning contexts of unbounded length. We show that incremental approximate inference can be performed in this model, thereby allowing it to be used in a text compression setting. The resulting compressor reliably outperforms several PPM variants on many types of data, but is particularly effective in compressing data that exhibits power law properties.