Learning the Causal Structure of Overlapping Variable Sets

  • Authors:
  • David Danks

  • Affiliations:
  • -

  • Venue:
  • DS '02 Proceedings of the 5th International Conference on Discovery Science
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many real-world applications of machine learning and data mining techniques, one finds that one must separate the variables under consideration into multiple subsets (perhaps to reduce computational complexity, or because of a shift in focus during data collection and analysis). In this paper, we use the framework of Bayesian networks to examine the problem of integrating the learning outputs for multiple overlapping datasets.In particular, we provide rules for extracting causal information about the true (unknown) Bayesian network from the previously learned (partial) Bayesian networks. We also provide the SLPR algorithm, which efficiently uses these previously learned Bayesian networks to guide learning of the full structure. A complexity analysis of the "worst-case" scenario for the SLPR algorithm reveals that the algorithm is always less complex than a comparable "reference" algorithm (though no absolute optimality proof is known).Although no "expectedcase" analysis is given, the complexity analysis suggests that (given the currently available set of algorithms) one should always use the SLPR algorithm, regardless of the underlying generating structure. The results provided in this paper point to a wide range of open questions, which are briefly discussed.