The trichotomy of HAVING queries on a probabilistic database

Authors:
Christopher Ré;Dan Suciu
Affiliations:
Department of Computer Science and Engineering, University of Washington, Seattle, USA;Department of Computer Science and Engineering, University of Washington, Seattle, USA
Venue:
The VLDB Journal — The International Journal on Very Large Data Bases
Year:
2009

Citing 36
Cited 8

Approximate counting, uniform generation and rapidly mixing Markov chains

Information and Computation
Randomized algorithms

Randomized algorithms
The merge/purge problem for large databases

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
ProbView: a flexible probabilistic database system

ACM Transactions on Database Systems (TODS)
The complexity of query reliability

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The Management of Probabilistic Data

IEEE Transactions on Knowledge and Data Engineering
Consistent Answers from Integrated Data Sources

FQAS '02 Proceedings of the 5th International Conference on Flexible Query Answering Systems
Scalar aggregation in inconsistent databases

Theoretical Computer Science - Database theory
The complexity of relational query languages (Extended Abstract)

STOC '82 Proceedings of the fourteenth annual ACM symposium on Theory of computing
Robust and efficient fuzzy match for online data cleaning

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Evaluating probabilistic queries over imprecise data

Proceedings of the 2003 ACM SIGMOD international conference on Management of data
Reasoning about knowledge and probability

TARK '88 Proceedings of the 2nd conference on Theoretical aspects of reasoning about knowledge
Aggregate operators in probabilistic databases

Journal of the ACM (JACM)
OLAP over uncertain and imprecise data

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Working Models for Uncertain Data

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Integrating Unstructured Data into Relational Databases

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Clean Answers over Dirty Databases: A Probabilistic Approach

ICDE '06 Proceedings of the 22nd International Conference on Data Engineering
Creating probabilistic databases from information extraction models

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Trio: a system for data, uncertainty, and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
OLAP over uncertain and imprecise data

The VLDB Journal — The International Journal on Very Large Data Bases
Management of probabilistic data: foundations and challenges

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Provenance semirings

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The dichotomy of conjunctive queries on probabilistic structures

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient aggregation algorithms for probabilistic data

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Materialized views in probabilistic databases: for information exchange and query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Approximating predicates and expressive queries on probabilistic databases

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Incorporating constraints in probabilistic XML

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Monte-Carlo algorithms for enumeration and reliability problems

SFCS '83 Proceedings of the 24th Annual Symposium on Foundations of Computer Science
Conditioning probabilistic databases

Proceedings of the VLDB Endowment
A compositional query algebra for second-order logic and uncertain databases

Proceedings of the 12th International Conference on Database Theory
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
Online Filtering, Smoothing and Probabilistic Modeling of Streaming data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
First-order query rewriting for inconsistent databases

ICDT'05 Proceedings of the 10th international conference on Database Theory

Read-once functions and query evaluation in probabilistic databases

Proceedings of the VLDB Endowment
Conditioning and aggregating uncertain data streams: going beyond expectations

Proceedings of the VLDB Endowment
Tractability in probabilistic databases

Proceedings of the 14th International Conference on Database Theory
Querying uncertain data with aggregate constraints

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
The monte carlo database system: Stochastic analysis close to the data

ACM Transactions on Database Systems (TODS)
Aggregation in probabilistic databases via knowledge compilation

Proceedings of the VLDB Endowment
CLARO: modeling and processing uncertain data streams

The VLDB Journal — The International Journal on Very Large Data Bases
Anytime approximation in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

We study the evaluation of positive conjunctive queries with Boolean aggregate tests (similar to HAVING in SQL) on probabilistic databases. More precisely, we study conjunctive queries with predicate aggregates on probabilistic databases where the aggregation function is one of MIN, MAX, EXISTS, COUNT, SUM, AVG, or COUNT(DISTINCT) and the comparison function is one of =, 驴,驴,,驴, or 驴, and the comparison function, 驴. In this paper, we establish a set of trichotomy results for conjunctive queries with HAVING predicates parametrized by (驴, 驴). For such queries (without self-joins), one of the following three statements is true: (1) the exact evaluation problem has $${\mathcal P}$$ -time data complexity. In this case, we call the query safe. (2) The exact evaluation problem is $${{\sharp{\mathcal P}}}$$ -hard, but the approximate evaluation problem has (randomized) $${{\mathcal P}}$$ -time data complexity. More precisely, there exists an FPTRAS for the query. In this case, we call the query apx-safe. (3) The exact evaluation problem is $${{\sharp{\mathcal P}}}$$ -hard, and the approximate evaluation problem is also hard. We call these queries hazardous. The precise definition of each class depends on the aggregate considered and the comparison function. Thus, we have queries that are (MAX,驴 )-safe, (COUNT,驴 )-apx-safe, (SUM,=)-hazardous, etc. Our trichotomy result is a significant extension of a previous dichotomy result for Boolean conjunctive queries into safe and not safe. For each of the three classes we present novel techniques. For safe queries, we describe an evaluation algorithm that uses random variables over semirings. For apx-safe queries, we describe an FPTRAS that relies on a novel algorithm for generating a random possible world satisfying a given condition. Finally, for hazardous queries we give novel proofs of hardness of approximation. The results for safe queries were previously announced (in Ré, C., Suciu, D. Efficient evaluation of. In: DBPL, pp. 186---200, 2007), but all other results are new.