Classification using discriminative restricted Boltzmann machines
Proceedings of the 25th international conference on Machine learning
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Justifying and generalizing contrastive divergence
Neural Computation
Learning Deep Architectures for AI
Foundations and Trends® in Machine Learning
On the expressive power of deep architectures
ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Learning a generative model of images by factoring appearance and shape
Neural Computation
Training restricted boltzmann machines with multi-tempering: harnessing parallelization
ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Energy-based temporal neural networks for imputing missing values
ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Training energy-based models for time-series imputation
The Journal of Machine Learning Research
The Shape Boltzmann Machine: A Strong Model of Object Shape
International Journal of Computer Vision
Hi-index | 0.00 |
We present a distribution model for binary vectors, called the influence combination model and show how this model can be used as the basis for unsupervised learning algorithms for feature selection. The model can be represented by a particular type of Boltzmann machine with a bipartite graph structure that we call the combination machine. This machine is closely related to the Harmonium model defined by Smolensky. In the first part of the paper we analyze properties of this distribution representation scheme. We show that arbitrary distributions of binary vectors can be approximated by the combination model. We show how the weight vectors in the model can be interpreted as high order correlation patterns among the input bits, and how the combination machine can be used as a mechanism for detecting these patterns. We compare the combination model with the mixture model and with principle component analysis. In the second part of the paper we present two algorithms for learning the combination model from examples. The first learning algorithm is the standard gradient ascent heuristic for computing maximum likelihood estimates for the parameters of the model. Here we give a closed form for this gradient that is significantly easier to compute than the corresponding gradient for the general Boltzmann machine. The second learning algorithm is a greedy method that creates the hidden units and computes their weights one at a time. This method is a variant of projection pursuit density estimation. In the third part of the paper we give experimental results for these learning methods on synthetic data and on natural data of handwritten digit images.