UNSUPERVISED LEARNING OF DISTRIBUTIONS ON BINARY VECTORS USING TWO LAYER NETWORKS

Authors:
Yoav Freund;David Haussler
Affiliations:
-;-
Venue:
UNSUPERVISED LEARNING OF DISTRIBUTIONS ON BINARY VECTORS USING TWO LAYER NETWORKS
Year:
1994

Citing 0
Cited 10

Classification using discriminative restricted Boltzmann machines

Proceedings of the 25th international conference on Machine learning
Curriculum learning

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Justifying and generalizing contrastive divergence

Neural Computation
Learning Deep Architectures for AI

Foundations and Trends® in Machine Learning
On the expressive power of deep architectures

ALT'11 Proceedings of the 22nd international conference on Algorithmic learning theory
Learning a generative model of images by factoring appearance and shape

Neural Computation
Training restricted boltzmann machines with multi-tempering: harnessing parallelization

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
Energy-based temporal neural networks for imputing missing values

ICONIP'12 Proceedings of the 19th international conference on Neural Information Processing - Volume Part II
Training energy-based models for time-series imputation

The Journal of Machine Learning Research
The Shape Boltzmann Machine: A Strong Model of Object Shape

International Journal of Computer Vision

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a distribution model for binary vectors, called the influence combination model and show how this model can be used as the basis for unsupervised learning algorithms for feature selection. The model can be represented by a particular type of Boltzmann machine with a bipartite graph structure that we call the combination machine. This machine is closely related to the Harmonium model defined by Smolensky. In the first part of the paper we analyze properties of this distribution representation scheme. We show that arbitrary distributions of binary vectors can be approximated by the combination model. We show how the weight vectors in the model can be interpreted as high order correlation patterns among the input bits, and how the combination machine can be used as a mechanism for detecting these patterns. We compare the combination model with the mixture model and with principle component analysis. In the second part of the paper we present two algorithms for learning the combination model from examples. The first learning algorithm is the standard gradient ascent heuristic for computing maximum likelihood estimates for the parameters of the model. Here we give a closed form for this gradient that is significantly easier to compute than the corresponding gradient for the general Boltzmann machine. The second learning algorithm is a greedy method that creates the hidden units and computes their weights one at a time. This method is a variant of projection pursuit density estimation. In the third part of the paper we give experimental results for these learning methods on synthetic data and on natural data of handwritten digit images.