Efficient Approximations for the MarginalLikelihood of Bayesian Networks with Hidden Variables

  • Authors:
  • David Maxwell Chickering;David Heckerman

  • Affiliations:
  • Microsoft Research Redmond WA 98052-6399. E-mail: dmax@microsoft.com, heckerma@microsoft.com;Microsoft Research Redmond WA 98052-6399. E-mail: dmax@microsoft.com, heckerma@microsoft.com

  • Venue:
  • Machine Learning - Special issue on learning with probabilistic representations
  • Year:
  • 1997

Quantified Score

Hi-index 0.01

Visualization

Abstract

We discuss Bayesian methods for model averaging and modelselection among Bayesian-network models with hidden variables. Inparticular, we examine large-sample approximations for the marginallikelihood of naive-Bayes models in which the root node is hidden.Such models are useful for clustering or unsupervised learning. Weconsider a Laplace approximation and the less accurate but morecomputationally efficient approximation known as the BayesianInformation Criterion (BIC), which is equivalent to Rissanen‘s(1987) Minimum Description Length (MDL). Also, weconsider approximations that ignore some off-diagonal elements of theobserved information matrix and an approximation proposed by Cheesemanand Stutz (1995). We evaluate the accuracy of theseapproximations using a Monte-Carlo gold standard. In experiments withartificial and real examples, we find that (1) none of theapproximations are accurate when used for model averaging, (2) all ofthe approximations, with the exception of BIC/MDL, are accurate formodel selection, (3) among the accurate approximations, theCheeseman–Stutz and Diagonal approximations are the mostcomputationally efficient, (4) all of the approximations, with theexception of BIC/MDL, can be sensitive to the prior distribution overmodel parameters, and (5) the Cheeseman–Stutz approximation can bemore accurate than the other approximations, including the Laplaceapproximation, in situations where the parameters in the maximum aposteriori configuration are near a boundary.