Training restricted boltzmann machines with multi-tempering: harnessing parallelization

Authors:
Philemon Brakel;Sander Dieleman;Benjamin Schrauwen
Affiliations:
Department of Electronics and Information Systems, Ghent University, Gent, Belgium;Department of Electronics and Information Systems, Ghent University, Gent, Belgium;Department of Electronics and Information Systems, Ghent University, Gent, Belgium
Venue:
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Year:
2012

Citing 8
Cited 1

Annealed importance sampling

Statistics and Computing
Training products of experts by minimizing contrastive divergence

Neural Computation
UNSUPERVISED LEARNING OF DISTRIBUTIONS ON BINARY VECTORS USING TWO LAYER NETWORKS

UNSUPERVISED LEARNING OF DISTRIBUTIONS ON BINARY VECTORS USING TWO LAYER NETWORKS
A fast learning algorithm for deep belief nets

Neural Computation
On the quantitative analysis of deep belief networks

Proceedings of the 25th international conference on Machine learning
Training restricted Boltzmann machines using approximations to the likelihood gradient

Proceedings of the 25th international conference on Machine learning
Using fast weights to improve persistent contrastive divergence

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Learning Deep Architectures for AI

Foundations and Trends® in Machine Learning

Training energy-based models for time-series imputation

The Journal of Machine Learning Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Restricted Boltzmann Machines (RBM's) are unsupervised probabilistic neural networks that can be stacked to form Deep Belief Networks. Given the recent popularity of RBM's and the increasing availability of parallel computing architectures, it becomes interesting to investigate learning algorithms for RBM's that benefit from parallel computations. In this paper, we look at two extensions of the parallel tempering algorithm, which is a Markov Chain Monte Carlo method to approximate the likelihood gradient. The first extension is directed at a more effective exchange of information among the parallel sampling chains. The second extension estimates gradients by averaging over chains from different temperatures. We investigate the efficiency of the proposed methods and demonstrate their usefulness on the MNIST dataset. Especially the weighted averaging seems to benefit Maximum Likelihood learning.