Scrubbing query results from probabilistic databases

Authors:
Jianwen Chen;Ling Feng;Wenwei Xue
Affiliations:
Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Nokia Research center, Beijing, China
Venue:
Proceedings of the 15th Symposium on International Database Engineering & Applications
Year:
2011

Citing 16
Cited 1

A probabilistic relational algebra for the integration of information retrieval and database systems

ACM Transactions on Information Systems (TOIS)
The Theory of Probabilistic Databases

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Artificial Intelligence: A Modern Approach

Artificial Intelligence: A Modern Approach
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Clean Answers over Dirty Databases: A Probabilistic Approach

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Ordering the attributes of query results

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Efficient query evaluation on probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Monte-Carlo algorithms for enumeration and reliability problems

SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
Cleaning uncertain data with quality guarantees

Proceedings of the VLDB Endowment
Exploiting Lineage for Confidence Computation in Uncertain and Probabilistic Databases

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Consistent query answers in inconsistent probabilistic databases

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Panda: a system for provenance and data

TAPP'10 Proceedings of the 2nd conference on Theory and practice of provenance
The complexity of causality and responsibility for query answers and non-answers

Proceedings of the VLDB Endowment
Tracing data errors with view-conditioned causality

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Mining probabilistic datasets vertically

Proceedings of the 16th International Database Engineering & Applications Sysmposium

Quantified Score

Hi-index	0.04

Visualization

Abstract

Queries over probabilistic databases lead to probabilistic results. As the process of arriving at these results is based on underlying data probabilities, we believe involving a user in the loop of query processing and leveraging the user's personal knowledge to deal with uncertain data, will enable the system to scrub (correct) and tailor its probabilistic query results towards a better quality from the perspective of the specific user. In this paper, we propose to open the black box of a probabilistic database query engine, and explain to the user how the engine comes up with the probabilistic query result as well as which uncertain tuples in the database the result is derived from. In this way, the user based on his/her knowledge about uncertain information can not only decide how much confidence to be placed on the query engine, but also help clarify some uncertain information so that the query engine can re-generate an improved query result. Two particular issues associated with such a probabilistic database query framework are addressed: (i) how to interact with a user for answer explanation and uncertainty clarification without bringing much burden to the user, and (ii) how to scrub/correct the query result without incurring much computation overhead to the query engine. Our performance study demonstrates the accuracy effectiveness and computational efficiency achieved by the proposed framework.