Average profile and limiting distribution for a phrase size in the Lempel-Ziv parsing algorithm

Authors:
G. Louchard;W. Szpankowski
Affiliations:
Lab. d'Inf. Theorique, Univ. Libre de Bruxelles;-
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 8

Height in a digital search tree and the longest phrase of the Lempel-Ziv scheme

SODA '00 Proceedings of the eleventh annual ACM-SIAM symposium on Discrete algorithms
Runs of geometrically distributed random variables: a probabilistic analysis

Journal of Computational and Applied Mathematics - Special issue: Probabilistic methods in combinatorics and combinatorial optimization
Generalized Lempel-Ziv parsing scheme and its preliminary analysis of the average profile

DCC '95 Proceedings of the Conference on Data Compression
Ascending runs of sequences of geometrically distributed random variables: a probabilistic analysis

Theoretical Computer Science
The Effect of Flexible Parsing for Dynamic Dictionary-Based Data Compression

Journal of Experimental Algorithmics (JEA)
Monotone runs of uniformly distributed integer random variables: a probabilistic analysis

Theoretical Computer Science - In memoriam: Alberto Del Lungo (1965-2003)
(Un)expected behavior of digital search tree profile

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
The expected profile of digital search trees

Journal of Combinatorial Theory Series A

Quantified Score

Hi-index	754.84

Visualization

Abstract

Consider the parsing algorithm developed by Lempel and Ziv (1978) that partitions a sequence of length n into variable phrases (blocks) such that a new block is the shortest substring not seen in the past as a phrase. In practice, the following parameters are of interest: number of phrases, the size of a phrase, the number of phrases of given size, and so forth. In this paper, we focus on the size of a randomly selected phrase, and the average number of phrases of a given size (the so-called average profile of phrase sizes). These parameters can be efficiently analyzed through a digital search tree representation. For a memoryless source with unequal probabilities of symbols generation (the so-called asymmetric Bernoulli model), we prove that the size of a typical phrase is asymptotically normally distributed with mean and variance explicitly computed. In terms of digital search trees, we prove the normal limiting distribution of the typical depth (i.e., the length of a path from the root to a randomly selected node). The latter finding is proved by a technique that belongs to the toolkit of the “analytical analysis of algorithms”, and it seems to be novel in the context of data compression