Objective priors from maximum entropy in data classification

Authors:
Francesco A. N. Palmieri;Domenico Ciuonzo
Affiliations:
Dipartimento di Ingegneria dell'Informazione, Seconda Universitá di Napoli (SUN), Real Casa dell'Annunziata, via Roma, 29, 81031 Aversa (CE), Italy;Dipartimento di Ingegneria dell'Informazione, Seconda Universitá di Napoli (SUN), Real Casa dell'Annunziata, via Roma, 29, 81031 Aversa (CE), Italy
Venue:
Information Fusion
Year:
2013

Citing 7
Cited 0

Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Statistical Multisource-Multitarget Information Fusion

Statistical Multisource-Multitarget Information Fusion
Decision making in the TBM: the necessity of the pignistic transformation

International Journal of Approximate Reasoning
Data Fusion with Entropic Priors

Proceedings of the 2011 conference on Neural Nets WIRN10: Proceedings of the 20th Italian Workshop on Neural Nets
Entropy expressions for multivariate continuous distributions

IEEE Transactions on Information Theory
On the entropy of continuous probability distributions (Corresp.)

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Lack of knowledge of the prior distribution in classification problems that operate on small data sets may make the application of Bayes' rule questionable. Uniform or arbitrary priors may provide classification answers that, even in simple examples, may end up contradicting our common sense about the problem. Entropic priors (EPs), via application of the maximum entropy (ME) principle, seem to provide good objective answers in practical cases leading to more conservative Bayesian inferences. EP are derived and applied to classification tasks when only the likelihood functions are available. In this paper, when inference is based only on one sample, we review the use of the EP also in comparison to priors that are obtained from maximization of the mutual information between observations and classes. This last criterion coincides with the maximization of the KL divergence between posteriors and priors that for large sample sets leads to the well-known reference (or Bernardo's) priors. Our comparison on single samples considers both approaches in prospective and clarifies differences and potentials. A combinatorial justification for EP, inspired by Wallis' combinatorial argument for entropy definition, is also included. The application of the EP to sequences (multiple samples) that may be affected by excessive domination of the class with the maximum entropy is also considered with a solution that guarantees posterior consistency. An explicit iterative algorithm is proposed for EP determination solely from knowledge of the likelihood functions. Simulations that compare EP with uniform priors on short sequences are also included.