Replacing Causal Faithfulness with Algorithmic Independence of Conditionals

Authors:
Jan Lemeire;Dominik Janzing
Affiliations:
ETRO Department, Vrije Universiteit Brussel (VUB), Brussels, Belgium 1050 and FMI Department, Interdisciplinary Institute for Broadband Technology (IBBT), Ghent, Belgium 9050;MPI for Intelligent Systems, Tubingen, Germany
Venue:
Minds and Machines
Year:
2013

Citing 10
Cited 0

On the Length of Programs for Computing Finite Binary Sequences

Journal of the ACM (JACM)
A Theory of Program Size Formally Identical to Information Theory

Journal of the ACM (JACM)
Causality: models, reasoning, and inference

Causality: models, reasoning, and inference
The power of intervention

Minds and Machines
The Minimum Description Length Principle (Adaptive Computation and Machine Learning)

The Minimum Description Length Principle (Adaptive Computation and Machine Learning)
Causal inference using the algorithmic Markov condition

IEEE Transactions on Information Theory
Causal Inference on Discrete Data Using Additive Noise Models

IEEE Transactions on Pattern Analysis and Machine Intelligence
Strong completeness and faithfulness in Bayesian networks

UAI'95 Proceedings of the Eleventh conference on Uncertainty in artificial intelligence
Information-geometric approach to inferring causal directions

Artificial Intelligence
Algorithmic statistics

IEEE Transactions on Information Theory

Quantified Score

Hi-index	0.00

Visualization

Abstract

Independence of Conditionals (IC) has recently been proposed as a basic rule for causal structure learning. If a Bayesian network represents the causal structure, its Conditional Probability Distributions (CPDs) should be algorithmically independent. In this paper we compare IC with causal faithfulness (FF), stating that only those conditional independences that are implied by the causal Markov condition hold true. The latter is a basic postulate in common approaches to causal structure learning. The common spirit of FF and IC is to reject causal graphs for which the joint distribution looks `non-generic'. The difference lies in the notion of genericity: FF sometimes rejects models just because one of the CPDs is simple, for instance if the CPD describes a deterministic relation. IC does not behave in this undesirable way. It only rejects a model when there is a non-generic relation between different CPDs although each CPD looks generic when considered separately. Moreover, it detects relations between CPDs that cannot be captured by conditional independences. IC therefore helps in distinguishing causal graphs that induce the same conditional independences (i.e., they belong to the same Markov equivalence class). The usual justification for FF implicitly assumes a prior that is a probability density on the parameter space. IC can be justified by Solomonoff's universal prior, assigning non-zero probability to those points in parameter space that have a finite description. In this way, it favours simple CPDs, and therefore respects Occam's razor. Since Kolmogorov complexity is uncomputable, IC is not directly applicable in practice. We argue that it is nevertheless helpful, since it has already served as inspiration and justification for novel causal inference algorithms.