Mixture representations for inference and learning in Boltzmann machines

Authors:
Neil D. Lawrence;Christopher M. Bishop;Michael I. Jordan
Affiliations:
Computer Laboratory, Cambridge, UK;Microsoft Research, Cambridge, UK;Center for Biological and Computational Learning, Cambridge, MA
Venue:
UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence
Year:
1998

Citing 4
Cited 2

Deterministic Boltzmann learning performs steepest descent in weight-space

Neural Computation
Approximating posterior distributions in belief networks using mixtures

NIPS '97 Proceedings of the 1997 conference on Advances in neural information processing systems 10
An introduction to variational methods for graphical models

Learning in graphical models
A view of the EM algorithm that justifies incremental, sparse, and other variants

Learning in graphical models

Variational cumulant expansions for intractable distributions

Journal of Artificial Intelligence Research
Mixture approximations to Bayesian networks

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Boltzmann machines are undirected graphical models with two-state stochastic variables, in which the logarithms of the clique potentials are quadratic functions of the node states. They have been widely studied in the neural computing literature, although their practical applicability has been limited by the difficulty of finding an effective learning algorithm. One well-established approach, known as mean field theory, represents the stochastic distribution using a factorized approximation. However, the corresponding learning algorithm often fails to find a good solution. We conjecture that this is due to the implicit uni-modality of the mean field approximation which is therefore unable to capture multi-modality in the true distribution. In this paper we use variational methods to approximate the stochastic distribution using multi-modal mixtures of factorized distributions. We present results for both inference and learning to demonstrate the effectiveness of this approach.