On the performance of data compression algorithms based upon string matching

  • Authors:
  • En-hui Yang;J. C. Kieffer

  • Affiliations:
  • Dept. of Electr. & Comput. Eng., Waterloo Univ., Ont.;-

  • Venue:
  • IEEE Transactions on Information Theory
  • Year:
  • 2006

Quantified Score

Hi-index 754.90

Visualization

Abstract

Lossless and lossy data compression algorithms based on string matching are considered. In the lossless case, a result of Wyner and Ziv (1989) is extended. In the lossy case, a data compression algorithm based on approximate string matching is analyzed in the following two frameworks: (1) the database and the source together form a Markov chain of finite order; (2) the database and the source are independent with the database coming from a Markov model and the source from a general stationary, ergodic model. In either framework, it is shown that the resulting compression rate converges with probability one to a quantity computable as the infimum of an information theoretic functional over a set of auxiliary random variables; the quantity is strictly greater than the rate distortion function of the source except in some symmetric cases. In particular, this result implies that the lossy algorithm proposed by Steinberg and Gutman (1993) is not optimal, even for memoryless or Markov sources