Universal lossless compression via multilevel pattern matching

  • Authors:
  • J. C. Kieffer;En-Hui Yang;G. J. Nelson;P. Cosman

  • Affiliations:
  • Dept. of Electr. & Comput. Eng., Minnesota Univ., Minneapolis, MN;-;-;-

  • Venue:
  • IEEE Transactions on Information Theory
  • Year:
  • 2006

Quantified Score

Hi-index 754.84

Visualization

Abstract

A universal lossless data compression code called the multilevel pattern matching code (MPM code) is introduced. In processing a finite-alphabet data string of length n, the MPM code operates at O(log log n) levels sequentially. At each level, the MPM code detects matching patterns in the input data string (substrings of the data appearing in two or more nonoverlapping positions). The matching patterns detected at each level are of a fixed length which decreases by a constant factor from level to level, until this fixed length becomes one at the final level. The MPM code represents information about the matching patterns at each level as a string of tokens, with each token string encoded by an arithmetic encoder. From the concatenated encoded token strings, the decoder can reconstruct the data string via several rounds of parallel substitutions. A O(1/log n) maximal redundancy/sample upper bound is established for the MPM code with respect to any class of finite state sources of uniformly bounded complexity. We also show that the MPM code is of linear complexity in terms of time and space requirements. The results of some MPM code compression experiments are reported