Always Good Turing: Asymptotically Optimal Probability Estimation

Authors:
Alon Orlitsky;Narayana P. Santhanam;Junan Zhang
Affiliations:
-;-;-
Venue:
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Year:
2003

Citing 0
Cited 6

Probabilistic Finite-State Machines-Part II

IEEE Transactions on Pattern Analysis and Machine Intelligence
Data streams: algorithms and applications

Foundations and Trends® in Theoretical Computer Science
A lower bound on compression of unknown alphabets

Theoretical Computer Science
Small-sample distribution estimation over sticky channels

ISIT'09 Proceedings of the 2009 IEEE international conference on Symposium on Information Theory - Volume 2
Estimating the unseen: an n/log(n)-sample estimator for entropy and support size, shown optimal via new CLTs

Proceedings of the forty-third annual ACM symposium on Theory of computing
Optimal discovery with probabilistic expert advice: finite time analysis and macroscopic optimality

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

While deciphering the German Enigma code during World War II, I.J. Good and A.M. Turing considered the problem of estimating a probability distribution from a sample of data. THey derived a surprising and unintuitive formula that has since been used in a variety of applications and studied by a number of researchers. Borrowing an information-theoretic and machine-learning framework, we define the attenuation of a probability estimator as the largest possible ratio between the per-symbol probability assigned to an arbitrarily-long sequence by any distribution, and the corresponding probability assigned by the estimator. We show that some common estimators have infinite attenuation and that the attenuation of the Good-Turing estimator is low, yet large than one. We then derive an estimator whose attenuation is one, namely, as the length of any sequence increases, the per-symbol probability assigned by the estimator is at least the highest possible. Interestingly, some of the proofs use celebrated results by Hardy and Ramanujan on the number of partitions of an integer. To better understand the behavior of the estimator, we study the probability it assigns to several simple sequences. We show that some sequences this probability agrees with our intuition, while for others it is rather unexpected.