Generalization error of automatic relevance determination

Authors:
Shinichi Nakajima;Sumio Watanabe
Affiliations:
Nikon Corporation, Kumagaya, Japan;Tokyo Institute of Technology, Yokohama, Japan
Venue:
ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Year:
2007

Citing 10
Cited 0

Keeping the neural networks simple by minimizing the description length of the weights

COLT '93 Proceedings of the sixth annual conference on Computational learning theory
Bayesian Learning for Neural Networks

Bayesian Learning for Neural Networks
Learning coefficients of layered models when the true distribution mismatches the singularities

Neural Computation
Convergence and asymptotic normality of variational Bayesian approximations for exponential family models with missing values

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Online Model Selection Based on the Variational Bayes

Neural Computation
Algebraic Analysis for Nonidentifiable Learning Machines

Neural Computation
Variational Bayes Solution of Linear Neural Networks and Its Generalization Performance

Neural Computation
Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation

The Journal of Machine Learning Research
Inferring parameters and structure of latent variable models by variational bayes

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Analytic solution of hierarchical variational bayes in linear inverse problem

ICANN'06 Proceedings of the 16th international conference on Artificial Neural Networks - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

The automatic relevance determination (ARD) shows good performance in many applications. Recently, it has been applied to brain current estimation with the variational method. Although people who use the ARD tend to pay attention to one benefit of the ARD, sparsity, we, in this paper, focus on another benefit, generalization. In this paper, we clarify the generalization error of the ARD in the case that a class of prior distributions is used, and show that good generalization is caused by singularities of the ARD. Sparsity is not observed in that case, however, the mechanism that the singularities provide good generalization implies the mechanism that they also provide sparsity.