Finding optimal bayesian networks

Authors:
David Maxwell Chickering;Christopher Meek
Affiliations:
Microsoft Research, Redmond, WA;Microsoft Research, Redmond, WA
Venue:
UAI'02 Proceedings of the Eighteenth conference on Uncertainty in artificial intelligence
Year:
2002

Citing 5
Cited 14

Probabilistic reasoning in intelligent systems: networks of plausible inference

Probabilistic reasoning in intelligent systems: networks of plausible inference
Learning Bayesian Networks: The Combination of Knowledge and Statistical Data

Machine Learning
Finding a path is harder than finding a tree

Journal of Artificial Intelligence Research
Learning polytrees

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
A transformational characterization of equivalent Bayesian network structures

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Optimal structure identification with greedy search

The Journal of Machine Learning Research
Learning Factor Graphs in Polynomial Time and Sample Complexity

The Journal of Machine Learning Research
Consistent Feature Selection for Pattern Recognition in Polynomial Time

The Journal of Machine Learning Research
Towards scalable and data efficient learning of Markov boundaries

International Journal of Approximate Reasoning
Learning an L1-regularized Gaussian Bayesian network in the equivalence class space

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Practically perfect

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
On local optima in learning bayesian networks

UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Parent assignment is hard for the MDL, AIC, and NML costs

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Review: learning bayesian networks: Approaches and issues

The Knowledge Engineering Review
Simultaneous learning of instantaneous and time-delayed genetic interactions using novel information theoretic scoring technique

ICONIP'11 Proceedings of the 18th international conference on Neural Information Processing - Volume Part II
Finding consensus Bayesian network structures

Journal of Artificial Intelligence Research
Estimating a causal order among groups of variables in linear models

ICANN'12 Proceedings of the 22nd international conference on Artificial Neural Networks and Machine Learning - Volume Part II
mDBN: motif based learning of gene regulatory networks using dynamic bayesian networks

Proceedings of the 15th annual conference on Genetic and evolutionary computation
Learning AMP chain graphs and some marginal models thereof under faithfulness

International Journal of Approximate Reasoning

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we derive optimality results for greedy Bayesian-network search algorithms that perform single-edge modifications at each step and use asymptotically consistent scoring criteria. Our results extend those of Meek (1997) and Chickering (2002), who demonstrate that in the limit of large datasets, if the generative distribution is perfect with respect to a DAG defined over the observable variables, such search algorithms will identify this optimal (i.e. generative) DAG model. We relax their assumption about the generative distribution, and assume only that this distribution satisfies the composition property over the observable variables, which is a more realistic assumption for real domains. Under this assumption, we guarantee that the search algorithms identify an inclusion-optimal model; that is, a model that (1) contains the generative distribution and (2) has no sub-model that contains this distribution. In addition, we show that the composition property is guaranteed to hold whenever the dependence relationships in the generative distribution can be characterized by paths between singleton elements in some generative graphical model (e.g. a DAG, a chain graph, or a Markov network) even when the generative model includes unobserved variables, and even when the observed data is subject to selection bias.