Inference with multinomial data: why to weaken the prior strength

Authors:
Cassio P. De Campos;Alessio Benavoli
Affiliations:
Dalle Molle Institute for Artificial Intelligence, Manno-Lugano, Switzerland;Dalle Molle Institute for Artificial Intelligence, Manno-Lugano, Switzerland
Venue:
IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Year:
2011

Citing 4
Cited 0

Bayesian reliability analysis using the Dirichlet prior distribution with emphasis on accelerated life testing run in random order

Proceedings of second world congress on Nonlinear analysts
A study of smoothing methods for language models applied to Ad Hoc information retrieval

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Inference from Multinomial Data Based on a MLE-Dominance Criterion

ECSQARU '09 Proceedings of the 10th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty
Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning

Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper considers inference from multinomial data and addresses the problem of choosing the strength of the Dirichlet prior under a mean-squared error criterion. We compare the Maximum Likelihood Estimator (MLE) and the most commonly used Bayesian estimators obtained by assuming a prior Dirichlet distribution with "noninformative" prior parameters, that is, the parameters of the Dirichlet are equal and altogether sum up to the so called strength of the prior. Under this criterion, MLE becomes more preferable than the Bayesian estimators at the increase of the number of categories k of the multinomial, because non-informative Bayesian estimators induce a region where they are dominant that quickly shrinks with the increase of k. This can be avoided if the strength of the prior is not kept constant but decreased with the number of categories. We argue that the strength should decrease at least k times faster than usual estimators do.