A widely applicable Bayesian information criterion

Authors:
Sumio Watanabe
Affiliations:
Department of Computational Intelligence and Systems Science, Tokyo Institute of Technology, Yokohama, Japan
Venue:
The Journal of Machine Learning Research
Year:
2013

Citing 12
Cited 0

Algebraic geometrical methods for hierarchical learning machines

Neural Networks
Algebraic Analysis for Singular Statistical Estimation

ALT '99 Proceedings of the 10th International Conference on Algorithmic Learning Theory
Singularities in mixture models and upper bounds of stochastic complexity

Neural Networks
Asymptotic Model Selection for Naive Bayesian Networks

The Journal of Machine Learning Research
Algebraic Analysis for Nonidentifiable Learning Machines

Neural Computation
Stochastic complexities of reduced rank regression in Bayesian estimation

Neural Networks
Algebraic Geometry and Statistical Learning Theory

Algebraic Geometry and Statistical Learning Theory
Asymptotic Equivalence of Bayes Cross Validation and Widely Applicable Information Criterion in Singular Learning Theory

The Journal of Machine Learning Research
An Asymptotic Behaviour of the Marginal Likelihood for General Markov Models

The Journal of Machine Learning Research
Algebraic geometric comparison of probability distributions

The Journal of Machine Learning Research
Learning coefficient of generalization error in bayesian estimation and vandermonde matrix-type singularity

Neural Computation
Singularities in complete bipartite graph-type Boltzmann machines and upper bounds of stochastic complexities

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

A statistical model or a learning machine is called regular if the map taking a parameter to a probability distribution is one-to-one and if its Fisher information matrix is always positive definite. If otherwise, it is called singular. In regular statistical models, the Bayes free energy, which is defined by the minus logarithm of Bayes marginal likelihood, can be asymptotically approximated by the Schwarz Bayes information criterion (BIC), whereas in singular models such approximation does not hold. Recently, it was proved that the Bayes free energy of a singular model is asymptotically given by a generalized formula using a birational invariant, the real log canonical threshold (RLCT), instead of half the number of parameters in BIC. Theoretical values of RLCTs in several statistical models are now being discovered based on algebraic geometrical methodology. However, it has been difficult to estimate the Bayes free energy using only training samples, because an RLCT depends on an unknown true distribution. In the present paper, we define a widely applicable Bayesian information criterion (WBIC) by the average log likelihood function over the posterior distribution with the inverse temperature 1/log n, where n is the number of training samples. We mathematically prove that WBIC has the same asymptotic expansion as the Bayes free energy, even if a statistical model is singular for or unrealizable by a statistical model. Since WBIC can be numerically calculated without any information about a true distribution, it is a generalized version of BIC onto singular statistical models.