A suboptimal lossy data compression based on approximate pattern matching

Authors:
T. Luczak;W. Szpankowski
Affiliations:
Math. Inst., Polish Acad. of Sci., Poznan;-
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 10

Pattern Matching Image Compression: Algorithmic and Empirical Results

IEEE Transactions on Pattern Analysis and Machine Intelligence
A guided tour to approximate string matching

ACM Computing Surveys (CSUR)
Multidimensional signal compression using multiscale recurrent patterns

Signal Processing - Image and Video Coding beyond Standards
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Real-Time Pattern Matching Using Projection Kernels

IEEE Transactions on Pattern Analysis and Machine Intelligence
Prefetching based on web usage mining

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
Statistical inference for the ε-entropy and the quadratic Rényi entropy

Journal of Multivariate Analysis
Applying parallel design techniques to template matching with GPUs

VECPAR'10 Proceedings of the 9th international conference on High performance computing for computational science
Bridging lossy and lossless compression by motif pattern discovery

General Theory of Information Transfer and Combinatorics
Complexity-compression tradeoffs in lossy compression via efficient random codebooks and databases

Problems of Information Transmission

Quantified Score

Hi-index	754.84

Visualization

Abstract

A practical suboptimal (variable source coding) algorithm for lossy data compression is presented. This scheme is based on approximate string matching, and it naturally extends the lossless Lempel-Ziv (1977) data compression scheme. Among others we consider the typical length of an approximately repeated pattern within the first n positions of a stationary mixing sequence where D percent of mismatches is allowed. We prove that there exists a constant r0(D) such that the length of such an approximately repeated pattern converges in probability to 1/r0(D) log n (pr.) but it almost surely oscillates between 1/r-∞(D) log n and 2/r1(D) log n, where r -∞(D)>r0(D)>r1(D)/2 are some constants. These constants are natural generalizations of Renyi entropies to the lossy environment. More importantly, we show that the compression ratio of a lossy data compression scheme based on such an approximate pattern matching is asymptotically equal to r0(D). We also establish the asymptotic behavior of the so-called approximate waiting time Nl which is defined as the time until a pattern of length C repeats approximately for the first time. We prove that log Nl/l→r0(D) (pr.) as l→∞. In general, r0(D)>R(D) where R(D) is the rate distortion function. Thus for stationary mixing sequences we settle in the negative the problem investigated by Steinberg and Gutman by showing that a lossy extension of the Wyner-Ziv (1989) scheme cannot be optimal