Towards integrative causal analysis of heterogeneous data sets and studies

Authors:
Ioannis Tsamardinos;Sofia Triantafillou;Vincenzo Lagani
Affiliations:
Institute of Computer Science, Foundation for Research and Technology, Hellas, Heraklion, Crete, Greece and Department of Computer Science, University of Crete;Institute of Computer Science, Foundation for Research and Technology, Hellas, Heraklion, Crete, Greece and Department of Computer Science, University of Crete;Institute of Computer Science, Foundation for Research and Technology, Hellas, Heraklion, Crete, Greece
Venue:
The Journal of Machine Learning Research
Year:
2012

Citing 24
Cited 0

Adaptive Probabilistic Networks with Hidden Variables

Machine Learning - Special issue on learning with probabilistic representations
Causality: models, reasoning, and inference

Causality: models, reasoning, and inference
Magical thinking in data mining: lessons from CoIL challenge 2000

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms

Learning to Classify Text Using Support Vector Machines: Methods, Theory and Algorithms
Distributed data mining on the grid

Future Generation Computer Systems - Grid computing: Towards a new computing infrastructure
The max-min hill-climbing Bayesian network structure learning algorithm

Machine Learning
Statistical Matching: Theory and Practice (Wiley Series in Survey Methodology)

Statistical Matching: Theory and Practice (Wiley Series in Survey Methodology)
Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)

Feature Extraction: Foundations and Applications (Studies in Fuzziness and Soft Computing)
A Linear Non-Gaussian Acyclic Model for Causal Discovery

The Journal of Machine Learning Research
Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)

Introduction to Statistical Relational Learning (Adaptive Computation and Machine Learning)
On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias

Artificial Intelligence
KEEL: a software tool to assess evolutionary algorithms for data mining problems

Soft Computing - A Fusion of Foundations, Methodologies and Applications - Special Issue on Evolutionary and Metaheuristics based Data Mining (EMBDM); Guest Editors: José A. Gámez, María J. del Jesús, José M. Puerta
Statistical matching of multiple sources: A look through coherence

International Journal of Approximate Reasoning
Structure learning with independent non-identically distributed data

ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Bayesian learning of Bayesian networks with informative priors

Annals of Mathematics and Artificial Intelligence
Bounding the false discovery rate in local Bayesian network learning

AAAI'08 Proceedings of the 23rd national conference on Artificial intelligence - Volume 2
Modeling wine preferences by data mining from physicochemical properties

Decision Support Systems
Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part II: Analysis and Extensions

The Journal of Machine Learning Research
A Survey on Transfer Learning

IEEE Transactions on Knowledge and Data Engineering
Permutation testing improves Bayesian network learning

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model

The Journal of Machine Learning Research
Causal discovery from a mixture of experimental and observational data

UAI'99 Proceedings of the Fifteenth conference on Uncertainty in artificial intelligence
Causal discovery from changes

UAI'01 Proceedings of the Seventeenth conference on Uncertainty in artificial intelligence
Causal inference and causal explanation with background knowledge

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present methods able to predict the presence and strength of conditional and unconditional dependencies (correlations) between two variables Y and Z never jointly measured on the same samples, based on multiple data sets measuring a set of common variables. The algorithms are specializations of prior work on learning causal structures from overlapping variable sets. This problem has also been addressed in the field of statistical matching. The proposed methods are applied to a wide range of domains and are shown to accurately predict the presence of thousands of dependencies. Compared against prototypical statistical matching algorithms and within the scope of our experiments, the proposed algorithms make predictions that are better correlated with the sample estimates of the unknown parameters on test data ; this is particularly the case when the number of commonly measured variables is low. The enabling idea behind the methods is to induce one or all causal models that are simultaneously consistent with (fit) all available data sets and prior knowledge and reason with them. This allows constraints stemming from causal assumptions (e.g., Causal Markov Condition, Faithfulness) to propagate. Several methods have been developed based on this idea, for which we propose the unifying name Integrative Causal Analysis (INCA). A contrived example is presented demonstrating the theoretical potential to develop more general methods for co-analyzing heterogeneous data sets. The computational experiments with the novel methods provide evidence that causally-inspired assumptions such as Faithfulness often hold to a good degree of approximation in many real systems and could be exploited for statistical inference. Code, scripts, and data are available at www.mensxmachina.org.