Mixture approximations to Bayesian networks

  • Authors:
  • Volker Tresp;Michael Haft;Reimar Hofmann

  • Affiliations:
  • Siemens AG, Corporate Technology, Neural Computation, Dept. Information and Communications, Munich, Germany;Siemens AG, Corporate Technology, Neural Computation, Dept. Information and Communications, Munich, Germany;Siemens AG, Corporate Technology, Neural Computation, Dept. Information and Communications, Munich, Germany

  • Venue:
  • UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Structure and parameters in a Bayesian network uniquely specify the probability distribution of the modeled domain. The locality of both structure and probabilistic information are the great benefits of Bayesian networks and require the modeler to only specify local information. On the other hand this locality of information might prevent the modeler -and even more any other personfrom obtaining a general overview of the important relationships within the domain. The goal of the work presented in this paper is to provide an "alternativen view on the knowledge encoded in a Bayesian network which might sometimes be very helpful for providing insights into the underlying domain. The basic idea is to calculate a mixture approximation to the probability distribution represented by the Bayesian network. The mixture component densities can be thought of as representing typical scenarios implied by the Bayesian model, providing intuition about the basic relationships. As an additional benefit, performing inference in the approximate model is very simple and intuitive and can provide additional insights. The computational complexity for the calculation of the mixture approximations critically depends on the measure which defines the distance between the probability distribution represented by the Bayesian network and the approximate distribution. Both the KL-divergence and the backward KL-divergence lead to inefficient algorithms. Incidentally, the latter is used in recent work on mixtures of mean field solutions to which the work presented here is closely related. We show, however, that using a mean squared error cost function leads to update equations which can be solved using the junction tree algorithm. We conclude that the mean squared error cost function can be used for Bayesian networks in which inference based on the junction tree is tractable. For large networks, however, one may have to rely on mean field approximations.