Empirical analysis of the divergence of Gibbs sampling based learning algorithms for restricted Boltzmann machines

Authors:
Asja Fischer;Christian Igel
Affiliations:
Institut für Neuroinformatik, Ruhr-Universität Bochum, Bochum, Germany;Institut für Neuroinformatik, Ruhr-Universität Bochum, Bochum, Germany
Venue:
ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Year:
2010

Citing 9
Cited 6

Information processing in dynamical systems: foundations of harmony theory

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Learning and relearning in Boltzmann machines

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Training products of experts by minimizing contrastive divergence

Neural Computation
Information Theory, Inference & Learning Algorithms

Information Theory, Inference & Learning Algorithms
A fast learning algorithm for deep belief nets

Neural Computation
Representational power of restricted boltzmann machines and deep belief networks

Neural Computation
Training restricted Boltzmann machines using approximations to the likelihood gradient

Proceedings of the 25th international conference on Machine learning
Using fast weights to improve persistent contrastive divergence

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Justifying and generalizing contrastive divergence

Neural Computation

Improved learning of Gaussian-Bernoulli restricted Boltzmann machines

ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Bounding the bias of contrastive divergence learning

Neural Computation
An abstract deep network for image classification

AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Enhanced gradient for training restricted boltzmann machines

Neural Computation
Training restricted Boltzmann machines: An introduction

Pattern Recognition
The flip-the-state transition operator for restricted Boltzmann machines

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

Learning algorithms relying on Gibbs sampling based stochastic approximations of the log-likelihood gradient have become a common way to train Restricted Boltzmann Machines (RBMs). We study three of these methods, Contrastive Divergence (CD) and its refined variants Persistent CD (PCD) and Fast PCD (FPCD). As the approximations are biased, the maximum of the log-likelihood is not necessarily obtained. Recently, it has been shown that CD, PCD, and FPCD can even lead to a steady decrease of the log-likelihood during learning. Taking artificial data sets from the literature we study these divergence effects in more detail. Our results indicate that the log-likelihood seems to diverge especially if the target distribution is difficult to learn for the RBM. The decrease of the likelihood can not be detected by an increase of the reconstruction error, which has been proposed as a stopping criterion for CD learning. Weight-decay with a carefully chosen weight-decay-parameter can prevent divergence.