The geometry of prior selection

  • Authors:
  • Hichem Snoussi

  • Affiliations:
  • IRCCyN, Institut de Recherche en Communications et Cybernétiques de Nantes, Ecole Centrale de Nantes, 1, Rue de la Noë, BP 92101, 44321, Nantes, France

  • Venue:
  • Neurocomputing
  • Year:
  • 2005

Quantified Score

Hi-index 0.01

Visualization

Abstract

This contribution is devoted to the selection of prior in a Bayesian learning framework. There is an extensive literature on the construction of non-informative priors and the subject seems far from a definite solution [Kass and Wasserman, Formal rules for selecting prior distributions: a review and annotated bibliography, Technical Report No. 583, Department of Statistics, Carnegie Mellon University, 1994]. We consider this problem in the light of the recent development of information geometric tools [Amari and Nagaoka, Methods of information geometry, in: Translations of Mathematical Monographs, AMS, vol. 191, Oxford University Press, Oxford, 2000]. The differential geometric analysis allows the formulation of the prior selection problem in a general manifold valued set of probability distributions. In order to construct the prior distribution, we propose a criteria expressing the trade off between decision error and uniformity constraint. The solution has an explicit expression obtained by variational calculus. In addition, it has two important invariance properties: invariance to the dominant measure of the data space and also invariance to the parametrization of a restricted parametric manifold. We show how the construction of a prior by projection is the best way to take into account the restriction to a particular family of parametric models. For instance, we apply this procedure to autoparallel restricted families. Two practical examples illustrate the proposed construction of prior. The first example deals with the learning of a mixture of multivariate Gaussians in a classification perspective. We show in this learning problem how the penalization of likelihood by the proposed prior eliminates the degeneracy occurring when approaching singularity points. The second example treats the blind source separation problem.