The flip-the-state transition operator for restricted Boltzmann machines

Authors:
Kai Brügge;Asja Fischer;Christian Igel
Affiliations:
Department of Computer Science, University of Helsinki, Helsinki, Finland 00014 and Helsinki Institute for Information Technology HIIT, Helsinki, Finland 00014;Institut für Neuroinformatik, Ruhr-Universität Bochum, Bochum, Germany 44780;Department of Computer Science, University of Copenhagen, Copenhagen, Denmark 2100
Venue:
Machine Learning
Year:
2013

Citing 10
Cited 1

Information processing in dynamical systems: foundations of harmony theory

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Training products of experts by minimizing contrastive divergence

Neural Computation
Information Theory, Inference & Learning Algorithms

Information Theory, Inference & Learning Algorithms
Training restricted Boltzmann machines using approximations to the likelihood gradient

Proceedings of the 25th international conference on Machine learning
Shark

The Journal of Machine Learning Research
Using fast weights to improve persistent contrastive divergence

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Justifying and generalizing contrastive divergence

Neural Computation
Empirical analysis of the divergence of Gibbs sampling based learning algorithms for restricted Boltzmann machines

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Quickly generating representative samples from an rbm-derived process

Neural Computation
Bounding the bias of contrastive divergence learning

Neural Computation

Learning ensemble classifiers via restricted Boltzmann machines

Pattern Recognition Letters

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most learning and sampling algorithms for restricted Boltzmann machines (RMBs) rely on Markov chain Monte Carlo (MCMC) methods using Gibbs sampling. The most prominent examples are Contrastive Divergence learning (CD) and its variants as well as Parallel Tempering (PT). The performance of these methods strongly depends on the mixing properties of the Gibbs chain. We propose a Metropolis-type MCMC algorithm relying on a transition operator maximizing the probability of state changes. It is shown that the operator induces an irreducible, aperiodic, and hence properly converging Markov chain, also for the typically used periodic update schemes. The transition operator can replace Gibbs sampling in RBM learning algorithms without producing computational overhead. It is shown empirically that this leads to faster mixing and in turn to more accurate learning.