Justifying and generalizing contrastive divergence

Authors:
Yoshua Bengio;Olivier Delalleau
Affiliations:
-;-
Venue:
Neural Computation
Year:
2009

Citing 8
Cited 12

Information processing in dynamical systems: foundations of harmony theory

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Learning and relearning in Boltzmann machines

Parallel distributed processing: explorations in the microstructure of cognition, vol. 1
Training products of experts by minimizing contrastive divergence

Neural Computation
UNSUPERVISED LEARNING OF DISTRIBUTIONS ON BINARY VECTORS USING TWO LAYER NETWORKS

UNSUPERVISED LEARNING OF DISTRIBUTIONS ON BINARY VECTORS USING TWO LAYER NETWORKS
Nonlinear Autoassociation Is Not Equivalent to PCA

Neural Computation
A fast learning algorithm for deep belief nets

Neural Computation
An empirical evaluation of deep architectures on problems with many factors of variation

Proceedings of the 24th international conference on Machine learning
Training restricted Boltzmann machines using approximations to the likelihood gradient

Proceedings of the 25th international conference on Machine learning

Learning Deep Architectures for AI

Foundations and Trends® in Machine Learning
Why Does Unsupervised Pre-training Help Deep Learning?

The Journal of Machine Learning Research
Empirical analysis of the divergence of Gibbs sampling based learning algorithms for restricted Boltzmann machines

ICANN'10 Proceedings of the 20th international conference on Artificial neural networks: Part III
Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion

The Journal of Machine Learning Research
Quickly generating representative samples from an rbm-derived process

Neural Computation
Two Distributed-State Models For Generating High-Dimensional Time Series

The Journal of Machine Learning Research
On the expressive power of deep architectures

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Deep learning networks for off-line handwritten signature recognition

CIARP'11 Proceedings of the 16th Iberoamerican Congress conference on Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications
Bounding the bias of contrastive divergence learning

Neural Computation
Enhanced gradient for training restricted boltzmann machines

Neural Computation
Training restricted Boltzmann machines: An introduction

Pattern Recognition
The flip-the-state transition operator for restricted Boltzmann machines

Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study an expansion of the log likelihood in undirected graphical models such as the restricted Boltzmann machine (RBM), where each term in the expansion is associated with a sample in a Gibbs chain alternating between two random variables (the visible vector and the hidden vector in RBMs). We are particularly interested in estimators of the gradient of the log likelihood obtained through this expansion. We show that its residual term converges to zero, justifying the use of a truncation---running only a short Gibbs chain, which is the main idea behind the contrastive divergence (CD) estimator of the log-likelihood gradient. By truncating even more, we obtain a stochastic reconstruction error, related through a mean-field approximation to the reconstruction error often used to train autoassociators and stacked autoassociators. The derivation is not specific to the particular parametric forms used in RBMs and requires only convergence of the Gibbs chain. We present theoretical and empirical evidence linking the number of Gibbs steps k and the magnitude of the RBM parameters to the bias in the CD estimator. These experiments also suggest that the sign of the CD estimator is correct most of the time, even when the bias is large, so that CD-k is a good descent direction even for small k.