Efficient evaluation of HAVING queries on a probabilistic database

Authors:
Christopher Ré;Dan Suciu
Affiliations:
Department of Computer Science and Engineering, University of Washington;Department of Computer Science and Engineering, University of Washington
Venue:
DBPL'07 Proceedings of the 11th international conference on Database programming languages
Year:
2007

Citing 19
Cited 14

The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
The complexity of query reliability

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Foundations of Databases: The Logical Level

Foundations of Databases: The Logical Level
The Management of Probabilistic Data

IEEE Transactions on Knowledge and Data Engineering
Scalar aggregation in inconsistent databases

Theoretical Computer Science - Database theory
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Aggregate operators in probabilistic databases

Journal of the ACM (JACM)
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Integrating Unstructured Data into Relational Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
OLAP over uncertain and imprecise data

The VLDB Journal — The International Journal on Very Large Data Bases
Management of probabilistic data: foundations and challenges

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Provenance semirings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient aggregation algorithms for probabilistic data

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Materialized views in probabilistic databases: for information exchange and query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
First-order query rewriting for inconsistent databases

ICDT'05 Proceedings of the 10th international conference on Database Theory

Management of data with uncertainties

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
Query efficiency in probabilistic XML models

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Incorporating constraints in probabilistic XML

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Query evaluation with soft-key constraints

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Managing Probabilistic Data with MystiQ: The Can-Do, the Could-Do, and the Can't-Do

SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
Modeling and querying probabilistic XML data

ACM SIGMOD Record
On Query Algebras for Probabilistic Databases

ACM SIGMOD Record
Probabilistic databases: diamonds in the dirt

Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Incorporating constraints in probabilistic XML

ACM Transactions on Database Systems (TODS)
Aggregate queries for discrete and continuous probabilistic XML

Proceedings of the 13th International Conference on Database Theory
Probabilistic data exchange

Proceedings of the 13th International Conference on Database Theory
Probabilistic data exchange

Journal of the ACM (JACM)
Capturing continuous data and answering aggregate queries in probabilistic XML

ACM Transactions on Database Systems (TODS)
PReach: Reachability in Probabilistic Signaling Networks

Proceedings of the International Conference on Bioinformatics, Computational Biology and Biomedical Informatics

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING queries in SQL) on probabilistic databases. Our motivation is to handle aggregate queries over imprecise data resulting from information integration or information extraction. More precisely, we study conjunctive queries with predicate aggregates using MIN, MAX, COUNT, SUM, AVG or COUNT(DISTINCT) on probabilistic databases. Computing the precise output probabilities for positive conjunctive queries (without HAVING) is #P-hard, but is in P for a restricted class of queries called safe queries. Further, for queries without self-joins either a query is safe or its data complexity is #P-Hard, which shows that safe queries exactly capture tractable queries without self-joins. In this paper, for each aggregate above, we find a class of queries that exactly capture efficient evaluation for HAVING queries without self-joins. Our algorithms use a novel technique to compute the marginal distributions of elements in a semiring, which may be of independent interest.