A lower bound on compression of unknown alphabets

Authors:
Nikola Jevtić;Alon Orlitsky;Narayana P. Santhanam
Affiliations:
ECE Department, University of California, San Diego, La Jolla, CA;ECE Department, University of California, San Diego, La Jolla, CA and CSE Department, University of California, San Diego, La Jolla, CA;ECE Department, University of California, San Diego, La Jolla, CA
Venue:
Theoretical Computer Science
Year:
2005

Citing 15
Cited 1

A game of prediction with expert advice

Journal of Computer and System Sciences - Special issue on the eighth annual workshop on computational learning theory, July 5–8, 1995
Minimax regret under log loss for general classes of experts

COLT '99 Proceedings of the twelfth annual conference on Computational learning theory
A general language model for information retrieval (poster abstract)

Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval
Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Multialphabet Coding with Separate Alphabet Description

SEQUENCES '97 Proceedings of the Compression and Complexity of Sequences 1997
Always Good Turing: Asymptotically Optimal Probability Estimation

FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Fisher information and stochastic complexity

IEEE Transactions on Information Theory
Universal portfolios with side information

IEEE Transactions on Information Theory
A decision-theoretic extension of stochastic complexity and its applications to learning

IEEE Transactions on Information Theory
Asymptotic minimax regret for data compression, gambling, and prediction

IEEE Transactions on Information Theory
Universal codes for finite sequences of integers drawn from a monotone distribution

IEEE Transactions on Information Theory
One-way communication and error-correcting codes

IEEE Transactions on Information Theory
Universal compression of memoryless sources over unknown alphabets

IEEE Transactions on Information Theory
Speaking of infinity [i.i.d. strings]

IEEE Transactions on Information Theory

Connections between probability estimation and graph theory

Allerton'09 Proceedings of the 47th annual Allerton conference on Communication, control, and computing

Quantified Score

Hi-index	5.23

Visualization

Abstract

Many applications call for universal compression of strings over large, possibly infinite, alphabets. However, it has long been known that the resulting redundancy is infinite even for i.i.d distributions. It was recently shown that the redundancy of the strings' patterns, which abstract the values of the symbols, retaining only their relative precedence, is subliner in the blocklength n, hence the per-symbol redundancy diminishers to zero. In this paper we show that pattern redundancy is at least (1.5log2e)n1/3 bits To do so, we construct a generating function whose coefficients lower bound the redundancy, and use Hayman's saddle-point approximation technique to determine the coefficients' asymptotic behavior.