Sensitivity analysis and explanations for robust query evaluation in probabilistic databases

Authors:
Bhargav Kanagal;Jian Li;Amol Deshpande
Affiliations:
University of Maryland, College Park, MD, USA;University of Maryland, College Park, MD, USA;University of Maryland, College Park, MD, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 27
Cited 7

Approximating probabilistic inference in Bayesian belief networks is NP-hard

Artificial Intelligence
Factoring and recognition of read-once functions using cographs and normality

Proceedings of the 38th annual Design Automation Conference
Why and Where: A Characterization of Data Provenance

ICDT '01 Proceedings of the 8th International Conference on Database Theory
Causes and Explanations: A Structural-Model Approach: Part 1: Causes

UAI '01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Provenance semirings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient aggregation algorithms for probabilistic data

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Data integration with uncertainty

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Computational Geometry: Algorithms and Applications

Computational Geometry: Algorithms and Applications
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
On the complexity of deriving schema mappings from database instances

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Cleaning uncertain data with quality guarantees

Proceedings of the VLDB Endowment
Approximate lineage for probabilistic databases

Proceedings of the VLDB Endowment
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Provenance in Databases: Why, How, and Where

Foundations and Trends in Databases
Causes and explanations: a structural-model approach-part II: explanations

IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 1
Sensitivity analysis in Markov networks

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
Optimal nonmyopic value of information in graphical models: efficient algorithms and theoretical limits

IJCAI'05 Proceedings of the 19th international joint conference on Artificial intelligence
PrDB: managing and exploiting rich correlations in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Lineage processing over correlated probabilistic databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
SystemT: an algebraic approach to declarative information extraction

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
The complexity of causality and responsibility for query answers and non-answers

Proceedings of the VLDB Endowment
Read-once functions and query evaluation in probabilistic databases

Proceedings of the VLDB Endowment
Treewidth in verification: local vs. global

LPAR'05 Proceedings of the 12th international conference on Logic for Programming, Artificial Intelligence, and Reasoning

Probabilistic techniques for obtaining accurate patient counts in Clinical Data Warehouses

Journal of Biomedical Informatics
Aggregation in probabilistic databases via knowledge compilation

Proceedings of the VLDB Endowment
H-Tree: a hybrid structure for confidence computation in probabilistic databases

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
A demonstration of DBWipes: clean as you query

Proceedings of the VLDB Endowment
Causality and responsibility: probabilistic queries revisited in uncertain databases

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Scorpion: explaining away outliers in aggregate queries

Proceedings of the VLDB Endowment
Anytime approximation in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly arise in uncertain data management: First, how to provide explanations for query results, e.g., Why is this tuple in my result? or Why does this output tuple have such high probability?. Second, the problem of determining the sensitive input tuples for the given query, e.g., users are interested to know the input tuples that can substantially alter the output, when their probabilities are modified (since they may be unsure about the input probability values). Existing systems provide the lineage/provenance of each of the output tuples in addition to the output probabilities, which is a boolean formula indicating the dependence of the output tuple on the input tuples. However, lineage does not immediately provide a quantitative relationship and it is not informative when we have multiple output tuples. In this paper, we propose a unified framework that can handle both the issues mentioned above to facilitate robust query processing. We formally define the notions of influence and explanations and provide algorithms to determine the top-l influential set of variables and the top-l set of explanations for a variety of queries, including conjunctive queries, probabilistic threshold queries, top-k queries and aggregation queries. Further, our framework naturally enables highly efficient incremental evaluation when input probabilities are modified (e.g., if uncertainty is resolved). Our preliminary experimental results demonstrate the benefits of our framework for performing robust query processing over probabilistic databases.