Permutation testing improves Bayesian network learning

Authors:
Ioannis Tsamardinos;Giorgos Borboudakis
Affiliations:
Computer Science Department, University of Crete and Institute of Computer Science, Foundation for Research and Technology, Hellas;Computer Science Department, University of Crete and Institute of Computer Science, Foundation for Research and Technology, Hellas
Venue:
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Year:
2010

Citing 9
Cited 3

Induction with randomization testing: decision-oriented analysis of large data sets

Induction with randomization testing: decision-oriented analysis of large data sets
Using a Permutation Test for Attribute Selection in Decision Trees

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Learning relational probability trees

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The max-min hill-climbing Bayesian network structure learning algorithm

Machine Learning
Permutation, Parametric, and Bootstrap Tests of Hypotheses (Springer Series in Statistics)

Permutation, Parametric, and Bootstrap Tests of Hypotheses (Springer Series in Statistics)
A comparison of meta-analysis methods for detecting differentially expressed genes in microarray experiments

Bioinformatics
Structure learning with independent non-identically distributed data

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Bounding the false discovery rate in local Bayesian network learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation

The Journal of Machine Learning Research

Mixture of Markov trees for Bayesian network structure learning with small datasets in high dimensional space

ECSQARU'11 Proceedings of the 11th European conference on Symbolic and quantitative approaches to reasoning with uncertainty
Towards integrative causal analysis of heterogeneous data sets and studies

The Journal of Machine Learning Research
An experimental comparison of hybrid algorithms for bayesian network structure learning

ECML PKDD'12 Proceedings of the 2012 European conference on Machine Learning and Knowledge Discovery in Databases - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are taking a peek "under the hood" of constraint-based learning of graphical models such as Bayesian Networks. This mainstream approach to learning is founded on performing statistical tests of conditional independence. In all prior work however, the tests employed for categorical data are only asymptotically-correct, i.e., they converge to the exact p-value in the sample limit. In this paper we present, evaluate, and compare exact tests, based on standard, adjustable, and semi-parametric Monte-Carlo permutation testing procedures appropriate for small sample sizes. It is demonstrated that (a) permutation testing is calibrated, i.e, the actual Type I error matches the significance level α set by the user; this is not the case with asymptotic tests, (b) permutation testing leads to more robust structural learning, and (c) permutation testing allows learning networks from multiple datasets sharing a common underlying structure but different distribution functions (e.g. continuous vs. discrete); we name this problem the Bayesian Network Meta-Analysis problem. In contrast, asymptotic tests may lead to erratic learning behavior in this task (error increasing with total sample-size). The semi-parametric permutation procedure we propose is a reasonable approximation of the basic procedure using 5000 permutations, while being only 10-20 times slower than the asymptotic tests for small sample sizes. Thus, this test should be practical in most graphical learning problems and could substitute asymptotic tests. The conclusions of our studies have ramifications for learning not only Bayesian Networks but other graphical models too and for related causal-based variable selection algorithms, such as HITON. The code is available at mensxmachina.org.