Data mining tasks and methods: Probabilistic and casual networks: mining for probabilistic networks

  • Authors:
  • Peter L. Spirtes

  • Affiliations:
  • Professor of Philosophy, Carnegie Mellon University, Pittsburgh, Pennsylvania

  • Venue:
  • Handbook of data mining and knowledge discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article provides an overview of how to handle uncertainty about which Bayesian network to use for calculating the effect of an ideal manipulation or a classification. The Bayesian approach to handling uncertainty is to put a prior distribution over all of the Bayesian networks and their parameters, and then use this to calculate a posterior distribution over the quantity of interest. This is in general computationally infeasible, due to the huge number of different Bayesian networks over a given set of variables. Other approaches approximate the Bayesian answer using Monte Carlo Markov chain algorithms, or Bayesian model averaging, where all Bayesian networks except for a few good Bayesian networks are ignored in order to simplify the calculations. The latter approach requires searching among the vast space of Bayesian networks for the good Bayesian networks. Several methods for scoring Bayesian networks, and several search algorithms are described. It is shown how the problems of equivalent models and latent variables complicate both searching and scoring. Finally, it is shown how searching over equivalence classes of Bayesian networks, instead of searching over Bayesian networks can simplify both scoring and search.