The Bayesian Evidence Scheme for Regularizing Probability-Density Estimating Neural Networks

Authors:
Dirk Husmeier
Affiliations:
Biomathematics and Statistics Scotland, Scottish Crop Research Institute, Invergowrie, Dundee DD2 5DA, U.K.
Venue:
Neural Computation
Year:
2000

Citing 13
Cited 2

Bayesian interpolation

Neural Computation
Hierarchical mixtures of experts and the EM algorithm

Neural Computation
Predicting conditional probability densities of stationary stochastic time series

Neural Networks
Neural networks for predicting conditional probability densities: improved training scheme combining EM and RVFL

Neural Networks
GTM: the generative topographic mapping

Neural Computation
Bayesian Approaches to Gaussian Mixture Modeling

IEEE Transactions on Pattern Analysis and Machine Intelligence
Mixtures of probabilistic principal component analyzers

Neural Computation
Independent factor analysis

Neural Computation
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Numerical Recipes in C: The Art of Scientific Computing

Numerical Recipes in C: The Art of Scientific Computing
Bayesian Inference of Noise Levels in Regression

ICANN 96 Proceedings of the 1996 International Conference on Artificial Neural Networks
On convergence properties of the em algorithm for gaussian mixtures

Neural Computation
Stochastic choice of basis functions in adaptive function approximation and the functional-link net

IEEE Transactions on Neural Networks

Learning Gaussian mixture models with entropy-based criteria

IEEE Transactions on Neural Networks
Entropy-based variational scheme for fast bayes learning of Gaussian mixtures

SSPR&SPR'10 Proceedings of the 2010 joint IAPR international conference on Structural, syntactic, and statistical pattern recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Training probability-density estimating neural networks with the expectation-maximization (EM) algorithm aims to maximize the likelihood of the training set and therefore leads to overfitting for sparse data. In this article, a regularization method for mixture models with generalized linear kernel centers is proposed, which adopts the Bayesian evidence approach and optimizes the hyperparameters of the prior by type II maximum likelihood. This includes a marginalization over the parameters, which is done by Laplace approximation and requires the derivation of the Hessian of the log-likelihood function. The incorporation of this approach into the standard training scheme leads to a modified form of the EM algorithm, which includes a regularization term and adapts the hyperparameters on-line after each EM cycle. The article presents applications of this scheme to classification problems, the prediction of stochastic time series, and latent space models.