Tracing data errors with view-conditioned causality

Authors:
Alexandra Meliou;Wolfgang Gatterbauer;Suman Nath;Dan Suciu
Affiliations:
University of Washington, Seattle, WA, USA;University of Washington, Seattle, WA, USA;Microsoft Research, Redmond, WA, USA;University of Washington, Seattle, WA, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 26
Cited 4

IR evaluation methods for retrieving highly relevant documents

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Tracing the lineage of view data in a warehousing environment

ACM Transactions on Database Systems (TODS)
On propagation of deletions and annotations through views

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Data integration: a theoretical perspective

Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Complexity results for structure-based causality

Artificial Intelligence
Why and Where: A Characterization of Data Provenance

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Causes and Explanations: A Structural-Model Approach: Part 1: Causes

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Optimizing ETL Processes in Data Warehouses

ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Schema mappings, data exchange, and metadata management

Proceedings of the twenty-fourth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Model management 2.0: manipulating richer mappings

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
Provenance semirings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Curated databases

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The influence of variables on Boolean functions

SFCS '88 Proceedings of the 29th Annual Symposium on Foundations of Computer Science
On the provenance of non-answers to queries over extracted data

Proceedings of the VLDB Endowment
Handbook of Satisfiability: Volume 185 Frontiers in Artificial Intelligence and Applications

Handbook of Satisfiability: Volume 185 Frontiers in Artificial Intelligence and Applications
Why not?

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Provenance in Databases: Why, How, and Where

Foundations and Trends in Databases
Responsibility and blame: a structural-model approach

Journal of Artificial Intelligence Research
MINIMAXSAT: an efficient weighted max-SAT solver

Journal of Artificial Intelligence Research
Artemis: a system for analyzing missing answers

Proceedings of the VLDB Endowment
How to ConQueR why-not questions

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
The complexity of causality and responsibility for query answers and non-answers

Proceedings of the VLDB Endowment
Explaining missing answers to SPJUA queries

Proceedings of the VLDB Endowment
On the Complexity of View Update Analysis and Its Application to Annotation Propagation

IEEE Transactions on Knowledge and Data Engineering
Improved exact solvers for weighted Max-SAT

SAT'05 Proceedings of the 8th international conference on Theory and Applications of Satisfiability Testing
On solving the partial MAX-SAT problem

SAT'06 Proceedings of the 9th international conference on Theory and Applications of Satisfiability Testing

Scrubbing query results from probabilistic databases

Proceedings of the 15th Symposium on International Database Engineering & Applications
Provenance-based dictionary refinement in information extraction

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Causality and responsibility: probabilistic queries revisited in uncertain databases

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Scorpion: explaining away outliers in aggregate queries

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

A surprising query result is often an indication of errors in the query or the underlying data. Recent work suggests using causal reasoning to find explanations for the surprising result. In practice, however, one often has multiple queries and/or multiple answers, some of which may be considered correct and others unexpected. In this paper, we focus on determining the causes of a set of unexpected results, possibly conditioned on some prior knowledge of the correctness of another set of results. We call this problem View-Conditioned Causality. We adapt the definitions of causality and responsibility for the case of multiple answers/views and provide a non-trivial algorithm that reduces the problem of finding causes and their responsibility to a satisfiability problem that can be solved with existing tools. We evaluate both the accuracy and effectiveness of our approach on a real dataset of user-generated mobile device tracking data, and demonstrate that it can identify causes of error more effectively than static Boolean influence and alternative notions of causality.