Variational Bayes Solution of Linear Neural Networks and Its Generalization Performance

Authors:
Shinichi Nakajima;Sumio Watanabe
Affiliations:
Nikon Corporation, Kumagaya, Saitama, 360-8559, Japan, nakajima.s@nikon.co.jp;Tokyo Institute of Technology, Midori-ku, Yokohama, Kanagawa, 226-8503, Japan, swatanab@pi.titech.ac.jp
Venue:
Neural Computation
Year:
2007

Citing 17
Cited 6

Bayesian interpolation

Neural Computation
Keeping the neural networks simple by minimizing the description length of the weights

COLT '93 Proceedings of the sixth annual conference on Computational learning theory
Bayesian parameter estimation via variational methods

Statistics and Computing
On the problem in model selection of neural network regression in overrealizable scenario

Neural Computation
Learning coefficients of layered models when the true distribution mismatches the singularities

Neural Computation
Convergence and asymptotic normality of variational Bayesian approximations for exponential family models with missing values

UAI '04 Proceedings of the 20th conference on Uncertainty in artificial intelligence
Online Model Selection Based on the Variational Bayes

Neural Computation
Algebraic Analysis for Nonidentifiable Learning Machines

Neural Computation
Singularities Affect Dynamics of Learning in Neuromanifolds

Neural Computation
Stochastic complexities of reduced rank regression in Bayesian estimation

Neural Networks
Generalization Performance of Subspace Bayes Approach in Linear Neural Networks

IEICE - Transactions on Information and Systems
Stochastic Complexities of Gaussian Mixtures in Variational Bayesian Approximation

The Journal of Machine Learning Research
Generalization error of linear neural networks in an empirical bayes approach

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Inferring parameters and structure of latent variable models by variational bayes

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Asymptotic model selection for naive Bayesian networks

UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Stochastic complexity of bayesian networks

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Learning in linear neural networks: a survey

IEEE Transactions on Neural Networks

Analysis of Variational Bayesian Matrix Factorization

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Asymptotic analysis of Bayesian generalization error with Newton diagram

Neural Networks
Generalization error of automatic relevance determination

ICANN'07 Proceedings of the 17th international conference on Artificial neural networks
Bayesian inference based on stationary fokker-planck sampling

Neural Computation
Two design methods of hyperparameters in variational Bayes learning for Bernoulli mixtures

Neurocomputing
Theoretical Analysis of Bayesian Matrix Factorization

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is well known that in unidentifiable models, the Bayes estimation provides much better generalization performance than the maximum likelihood (ML) estimation. However, its accurate approximation by Markov chain Monte Carlo methods requires huge computational costs. As an alternative, a tractable approximation method, called the variational Bayes (VB) approach, has recently been proposed and has been attracting attention. Its advantage over the expectation maximization (EM) algorithm, often used for realizing the ML estimation, has been experimentally shown in many applications; nevertheless, it has not yet been theoretically shown. In this letter, through analysis of the simplest unidentifiable models, we theoretically show some properties of the VB approach. We first prove that in three-layer linear neural networks, the VB approach is asymptotically equivalent to a positive-part James-Stein type shrinkage estimation. Then we theoretically clarify its free energy, generalization error, and training error. Comparing them with those of the ML estimation and the Bayes estimation, we discuss the advantage of the VB approach. We also show that unlike in the Bayes estimation, the free energy and the generalization error are less simply related with each other and that in typical cases, the VB free energy well approximates the Bayes one, while the VB generalization error significantly differs from the Bayes one.