Information processing in dynamical systems: foundations of harmony theory
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Training products of experts by minimizing contrastive divergence
Neural Computation
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
Training restricted Boltzmann machines using approximations to the likelihood gradient
Proceedings of the 25th international conference on Machine learning
The Journal of Machine Learning Research
Using fast weights to improve persistent contrastive divergence
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Justifying and generalizing contrastive divergence
Neural Computation
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Quickly generating representative samples from an rbm-derived process
Neural Computation
Bounding the bias of contrastive divergence learning
Neural Computation
Learning ensemble classifiers via restricted Boltzmann machines
Pattern Recognition Letters
Hi-index | 0.00 |
Most learning and sampling algorithms for restricted Boltzmann machines (RMBs) rely on Markov chain Monte Carlo (MCMC) methods using Gibbs sampling. The most prominent examples are Contrastive Divergence learning (CD) and its variants as well as Parallel Tempering (PT). The performance of these methods strongly depends on the mixing properties of the Gibbs chain. We propose a Metropolis-type MCMC algorithm relying on a transition operator maximizing the probability of state changes. It is shown that the operator induces an irreducible, aperiodic, and hence properly converging Markov chain, also for the typically used periodic update schemes. The transition operator can replace Gibbs sampling in RBM learning algorithms without producing computational overhead. It is shown empirically that this leads to faster mixing and in turn to more accurate learning.