Information processing in dynamical systems: foundations of harmony theory
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Learning and relearning in Boltzmann machines
Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Training products of experts by minimizing contrastive divergence
Neural Computation
Information Theory, Inference & Learning Algorithms
Information Theory, Inference & Learning Algorithms
A fast learning algorithm for deep belief nets
Neural Computation
Training restricted Boltzmann machines using approximations to the likelihood gradient
Proceedings of the 25th international conference on Machine learning
Using fast weights to improve persistent contrastive divergence
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Justifying and generalizing contrastive divergence
Neural Computation
Improved learning of Gaussian-Bernoulli restricted Boltzmann machines
ICANN'11 Proceedings of the 21th international conference on Artificial neural networks - Volume Part I
Bounding the bias of contrastive divergence learning
Neural Computation
An abstract deep network for image classification
AI'12 Proceedings of the 25th Australasian joint conference on Advances in Artificial Intelligence
Enhanced gradient for training restricted boltzmann machines
Neural Computation
Training restricted Boltzmann machines: An introduction
Pattern Recognition
Hi-index | 0.00 |
Learning algorithms relying on Gibbs sampling based stochastic approximations of the log-likelihood gradient have become a common way to train Restricted Boltzmann Machines (RBMs). We study three of these methods, Contrastive Divergence (CD) and its refined variants Persistent CD (PCD) and Fast PCD (FPCD). As the approximations are biased, the maximum of the log-likelihood is not necessarily obtained. Recently, it has been shown that CD, PCD, and FPCD can even lead to a steady decrease of the log-likelihood during learning. Taking artificial data sets from the literature we study these divergence effects in more detail. Our results indicate that the log-likelihood seems to diverge especially if the target distribution is difficult to learn for the RBM. The decrease of the likelihood can not be detected by an increase of the reconstruction error, which has been proposed as a stopping criterion for CD learning. Weight-decay with a carefully chosen weight-decay-parameter can prevent divergence.