Maximum Likelihood Set for Estimating a Probability Mass Function

Authors:
Bruno M. Jedynak;Sanjeev M. Khudanpur
Affiliations:
Département de Mathématiques, Université des Sciences et Technologies de Lille, France, and Department of Applied Mathematics, and Center for Imaging Science, Johns Hopkins Universi ...;Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD 21218, U.S.A.
Venue:
Neural Computation
Year:
2005

Citing 7
Cited 4

Elements of information theory

Elements of information theory
A maximum entropy approach to natural language processing

Computational Linguistics
Statistical methods for speech recognition

Statistical methods for speech recognition
On the Convergence Rate of Good-Turing Estimators

COLT '00 Proceedings of the Thirteenth Annual Conference on Computational Learning Theory
An empirical study of smoothing techniques for language modeling

ACL '96 Proceedings of the 34th annual meeting on Association for Computational Linguistics
Extension of Zipf's law to words and phrases

COLING '02 Proceedings of the 19th international conference on Computational linguistics - Volume 1
Evaluation and extension of maximum entropy models with inequality constraints

EMNLP '03 Proceedings of the 2003 conference on Empirical methods in natural language processing

Extending Zipf's law to n-grams for large corpora

Artificial Intelligence Review
Less is more: significance-based N-gram selection for smaller, better language models

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Small-sample distribution estimation over sticky channels

ISIT'09 Proceedings of the 2009 IEEE international conference on Symposium on Information Theory - Volume 2
The maximum likelihood probability of unique-singleton, ternary, and length-7 patterns

ISIT'09 Proceedings of the 2009 IEEE international conference on Symposium on Information Theory - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a new method for estimating the probability mass function (pmf) of a discrete and finite random variable from a small sample. We focus on the observed counts—the number of times each value appears in the sample—and define the maximum likelihood set (MLS) as the set of pmfs that put more mass on the observed counts than on any other set of counts possible for the same sample size. We characterize the MLS in detail in this article. We show that the MLS is a diamond-shaped subset of the probability simplex [0,1]k bounded by at most k × (k–1) hyper-planes, where k is the number of possible values of the random variable. The MLS always contains the empirical distribution, as well as a family of Bayesian estimators based on a Dirichlet prior, particularly the well-known Laplace estimator. We propose to select from the MLS the pmf that is closest to a fixed pmf that encodes prior knowledge. When using Kullback-Leibler distance for this selection, the optimization problem comprises finding the minimum of a convex function over a domain defined by linear inequalities, for which standard numerical procedures are available. We apply this estimate to language modeling using Zipf's law to encode prior knowledge and show that this method permits obtaining state-of-the-art results while being conceptually simpler than most competing methods.