Parallel Bayesian network structure learning with application to gene networks
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Bayesian networks enable formal probabilistic reasoning on a set of interacting variables of a domain, and have been shown to have broad applicability. More specifically, in bioinformatics Bayesian networks are used to model gene interactions. Learning the structure of a Bayesian network is an NP-hard problem making it necessary to employ heuristics for solving large-scale problems. In this paper, we present parallel algorithms for two problems that arise in relation with network structure learning and analysis: (i) the discovery of all direct causal relations for each variable, i.e., the set of parents and children of each node in the corresponding Bayesian network, and (ii) the computation of Markov boundary of each variable, defined as the minimal set of variables that shield the target variable from all other variables in the domain. Our parallel algorithms are based on state-of-the art constraint-based heuristic optimization methods. They are shown to be work-optimal and communication efficient, and exhibit nearly perfect scaling.