Automatic identification of quasi-experimental designs for discovering causal knowledge

Authors:
David D. Jensen;Andrew S. Fast;Brian J. Taylor;Marc E. Maier
Affiliations:
University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA;University of Massachusetts Amherst, Amherst, MA, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 6
Cited 5

The entity-relationship model—toward a unified view of data

ACM Transactions on Database Systems (TODS) - Special issue: papers from the international conference on very large data bases: September 22–24, 1975, Framingham, MA
Causality: models, reasoning, and inference

Causality: models, reasoning, and inference
Case Method: Entity Relationship Modelling

Case Method: Entity Relationship Modelling
Building large knowledge bases by mass collaboration

Proceedings of the 2nd international conference on Knowledge capture
Distinguishing causal and acausal temporal relations

PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Beyond prediction: directions for probabilistic and relational learning

ILP'07 Proceedings of the 17th international conference on Inductive logic programming

A rough set approach to mining connections from information systems

Proceedings of the 2010 ACM Symposium on Applied Computing
A rough set approach to multiple dataset analysis

Applied Soft Computing
Causal discovery in social media using quasi-experimental designs

Proceedings of the First Workshop on Social Media Analytics
Exploring social influence via posterior effect of word-of-mouth recommendations

Proceedings of the fifth ACM international conference on Web search and data mining
Transforming graph data for statistical relational learning

Journal of Artificial Intelligence Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

Researchers in the social and behavioral sciences routinely rely on quasi-experimental designs to discover knowledge from large data-bases. Quasi-experimental designs (QEDs) exploit fortuitous circumstances in non-experimental data to identify situations (sometimes called "natural experiments") that provide the equivalent of experimental control and randomization. QEDs allow researchers in domains as diverse as sociology, medicine, and marketing to draw reliable inferences about causal dependencies from non-experimental data. Unfortunately, identifying and exploiting QEDs has remained a painstaking manual activity, requiring researchers to scour available databases and apply substantial knowledge of statistics. However, recent advances in the expressiveness of databases, and increases in their size and complexity, provide the necessary conditions to automatically identify QEDs. In this paper, we describe the first system to discover knowledge by applying quasi-experimental designs that were identified automatically. We demonstrate that QEDs can be identified in a traditional database schema and that such identification requires only a small number of extensions to that schema, knowledge about quasi-experimental design encoded in first-order logic, and a theorem-proving engine. We describe several key innovations necessary to enable this system, including methods for automatically constructing appropriate experimental units and for creating aggregate variables on those units. We show that applying the resulting designs can identify important causal dependencies in real domains, and we provide examples from academic publishing, movie making and marketing, and peer-production systems. Finally, we discuss the integration of QEDs with other approaches to causal discovery, including joint modeling and directed experimentation.