The zero-frequency problem: estimating the probabilities of novel events in adaptive text compression

  • Authors:
  • I. H. Witten;T. C. Bell

  • Affiliations:
  • Dept. of Comput. Sci., Calgary Univ., Alta.;-

  • Venue:
  • IEEE Transactions on Information Theory
  • Year:
  • 2006

Quantified Score

Hi-index 754.84

Visualization

Abstract

Approaches to the zero-frequency problem in adaptive text compression are discussed. This problem relates to the estimation of the likelihood of a novel event occurring. Although several methods have been used, their suitability has been on empirical evaluation rather than a well-founded model. The authors propose the application of a Poisson process model of novelty. Its ability to predict novel tokens is evaluated, and it consistently outperforms existing methods. It is applied to a practical statistical coding scheme, where a slight modification is required to avoid divergence. The result is a well-founded zero-frequency model that explains observed differences in the performance of existing methods, and offers a small improvement in the coding efficiency of text compression over the best method previously known