Training products of experts by minimizing contrastive divergence
Neural Computation
A fast learning algorithm for deep belief nets
Neural Computation
Training restricted Boltzmann machines using approximations to the likelihood gradient
Proceedings of the 25th international conference on Machine learning
Using fast weights to improve persistent contrastive divergence
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Herding dynamical weights to learn
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Justifying and generalizing contrastive divergence
Neural Computation
Herding dynamic weights for partially observed random field models
UAI '09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence
On the expressive power of deep architectures
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Hi-index | 0.00 |
Two recently proposed learning algorithms, herding and fast persistent contrastive divergence (FPCD), share the following interesting characteristic: they exploit changes in the model parameters while sampling in order to escape modes and mix better during the sampling process that is part of the learning algorithm. We justify such approaches as ways to escape modes while keeping approximately the same asymptotic distribution of the Markov chain. In that spirit, we extend FPCD using an idea borrowed from Herding in order to obtain a pure sampling algorithm, which we call the rates-FPCD sampler. Interestingly, this sampler can improve the model as we collect more samples, since it optimizes a lower bound on the log likelihood of the training data. We provide empirical evidence that this new algorithm displays substantially better and more robust mixing than Gibbs sampling.