Induction with randomization testing: decision-oriented analysis of large data sets
Induction with randomization testing: decision-oriented analysis of large data sets
Using a Permutation Test for Attribute Selection in Decision Trees
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning relational probability trees
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Permutation, Parametric, and Bootstrap Tests of Hypotheses (Springer Series in Statistics)
Permutation, Parametric, and Bootstrap Tests of Hypotheses (Springer Series in Statistics)
Structure learning with independent non-identically distributed data
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Bounding the false discovery rate in local Bayesian network learning
AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
The Journal of Machine Learning Research
ECSQARU'11 Proceedings of the 11th European conference on Symbolic and quantitative approaches to reasoning with uncertainty
Towards integrative causal analysis of heterogeneous data sets and studies
The Journal of Machine Learning Research
An experimental comparison of hybrid algorithms for bayesian network structure learning
ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I
Hi-index | 0.00 |
We are taking a peek "under the hood" of constraint-based learning of graphical models such as Bayesian Networks. This mainstream approach to learning is founded on performing statistical tests of conditional independence. In all prior work however, the tests employed for categorical data are only asymptotically-correct, i.e., they converge to the exact p-value in the sample limit. In this paper we present, evaluate, and compare exact tests, based on standard, adjustable, and semi-parametric Monte-Carlo permutation testing procedures appropriate for small sample sizes. It is demonstrated that (a) permutation testing is calibrated, i.e, the actual Type I error matches the significance level α set by the user; this is not the case with asymptotic tests, (b) permutation testing leads to more robust structural learning, and (c) permutation testing allows learning networks from multiple datasets sharing a common underlying structure but different distribution functions (e.g. continuous vs. discrete); we name this problem the Bayesian Network Meta-Analysis problem. In contrast, asymptotic tests may lead to erratic learning behavior in this task (error increasing with total sample-size). The semi-parametric permutation procedure we propose is a reasonable approximation of the basic procedure using 5000 permutations, while being only 10-20 times slower than the asymptotic tests for small sample sizes. Thus, this test should be practical in most graphical learning problems and could substitute asymptotic tests. The conclusions of our studies have ramifications for learning not only Bayesian Networks but other graphical models too and for related causal-based variable selection algorithms, such as HITON. The code is available at mensxmachina.org.