Asymptotic properties of data compression and suffix trees

Authors:
W. Szpankowski
Affiliations:
Dept. of Comput. Sci., Purdue Univ., West Lafayette, IN
Venue:
IEEE Transactions on Information Theory
Year:
2006

Citing 0
Cited 2

Text indexing with errors

Journal of Discrete Algorithms
A new method for approximate indexing and dictionary lookup with one error

Information Processing Letters

Quantified Score

Hi-index	754.84

Visualization

Abstract

Recently, Wyner and Ziv (see ibid., vol.35, p.1250-8, 1989) have proved that the typical length of a repeated subword found within the first n positions of a stationary ergodic sequence is (1/h) log n in probability where h is the entropy of the alphabet. This finding was used to obtain several insights into certain universal data compression schemes, most notably the Lempel-Ziv data compression algorithm. Wyner and Ziv have also conjectured that their result can be extended to a stronger almost sure convergence. In this paper, we settle this conjecture in the negative in the so called right domain asymptotic, that is, during a dynamic phase of expanding the data base. We prove-under an additional assumption involving mixing conditions-that the length of a typical repeated subword oscillates almost surely (a.s.) between (1/h1)log n and (1/h2)log n where D