Reversible jump MCMC simulated annealing for neural networks

Authors:
Christophe Andrieu;Nando de Freitas;Arnaud Doucet
Affiliations:
Engineering Department, Cambridge University, Cambridge, UK;UC Berkeley Computer Science Division, Berkeley, CA;Engineering Department, Cambridge University, Cambridge, UK
Venue:
UAI'00 Proceedings of the Sixteenth conference on Uncertainty in artificial intelligence
Year:
2000

Citing 7
Cited 2

Simulated annealing: theory and applications

Simulated annealing: theory and applications
A practical Bayesian framework for backpropagation networks

Neural Computation
Regularization theory and neural networks architectures

Neural Computation
Bayesian radial basis functions of variable dimension

Neural Computation
Model selection by MCMC computation

Signal Processing - Special section on Markov Chain Monte Carlo (MCMC) methods for signal processing
Bayesian Learning for Neural Networks

Bayesian Learning for Neural Networks
Asymptotic MAP criteria for model selection

IEEE Transactions on Signal Processing

Most Relevant Explanation: computational complexity and approximation methods

Annals of Mathematics and Artificial Intelligence
On the statistical determination of optimal camera configurations in large scale surveillance networks

ECCV'12 Proceedings of the 12th European conference on Computer Vision - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose a novel reversible jump Markov chain Monte Carlo (MCMC) simulated annealing algorithm to optimize radial basis function (RBF) networks. This algorithm enables us to maximize the joint posterior distribution of the network parameters and the number of basis functions. It performs a global search in the joint space of the parameters and number of parameters, thereby surmounting the problem of local minima. We also show that by calibrating a Bayesian model, we can obtain the classical AIC, BIC and MDL model selection criteria within a penalized likelihood framework. Finally, we show theoretically and empirically that the algorithm converges to the modes of the full posterior distribution in an efficient way.