SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases

Authors:
Dan Olteanu;Jiewen Huang;Christoph Koch
Affiliations:
-;-;-
Venue:
ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Year:
2009

Citing 0
Cited 27

A compositional query algebra for second-order logic and uncertain databases

Proceedings of the 12th International Conference on Database Theory
Probabilistic databases: diamonds in the dirt

Communications of the ACM - Barbara Liskov: ACM's A.M. Turing Award Winner
Secondary-storage confidence computation for conjunctive queries with inequalities

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MayBMS: a probabilistic database management system

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
The trichotomy of HAVING queries on a probabilistic database

The VLDB Journal — The International Journal on Very Large Data Bases
On chase termination beyond stratification

Proceedings of the VLDB Endowment
Bridging the gap between intensional and extensional query evaluation in probabilistic databases

Proceedings of the 13th International Conference on Extending Database Technology
Semantic query optimization in the presence of types

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Computing query probability with incidence algebras

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Threshold query optimization for uncertain data

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Combining intensional with extensional query evaluation in tuple independent probabilistic databases

Information Sciences: an International Journal
Read-once functions and query evaluation in probabilistic databases

Proceedings of the VLDB Endowment
Tractability in probabilistic databases

Proceedings of the 14th International Conference on Database Theory
Faster query answering in probabilistic databases using read-once functions

Proceedings of the 14th International Conference on Database Theory
Database foundations for scalable RDF processing

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Probabilistic management of OCR data using an RDBMS

Proceedings of the VLDB Endowment
Local structure and determinism in probabilistic databases

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
H-Tree: a hybrid structure for confidence computation in probabilistic databases

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Probabilistic databases with MarkoViews

Proceedings of the VLDB Endowment
The dichotomy of probabilistic inference for unions of conjunctive queries

Journal of the ACM (JACM)
On the foundations of probabilistic information integration

Proceedings of the 21st ACM international conference on Information and knowledge management
Towards high-throughput gibbs sampling at scale: a study across storage managers

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Semantic query optimization in the presence of types

Journal of Computer and System Sciences
A compact representation for efficient uncertain-information integration

Proceedings of the 17th International Database Engineering & Applications Symposium
Oblivious bounds on the probability of boolean functions

ACM Transactions on Database Systems (TODS)
A temporal-probabilistic database model for information extraction

Proceedings of the VLDB Endowment
Anytime approximation in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

A paramount challenge in probabilistic databases is the scalable computation of confidences of tuples in query results. This paper introduces an efficient secondary-storage operator for exact computation of queries on tuple-independent probabilistic databases. We consider the conjunctive queries without self-joins that are known to be tractable on any tuple-independent database, and queries that are not tractable in general but become tractable on probabilistic databases restricted by functional dependencies. Our operator is semantically equivalent to a sequence of aggregations and can be naturally integrated into existing relational query plans. As a proof of concept, we developed an extension of the PostgreSQL 8.3.3 query engine called SPROUT. We study optimizations that push or pull our operator or parts thereof past joins. The operator employs static information, such as the query structure and functional dependencies, to decide which constituent aggregations can be evaluated together in one scan and how many scans are needed for the overall confidence computation task. A case study on the TPC-H benchmark reveals that most TPC-H queries obtained by removing aggregations can be evaluated efficiently using our operator. Experimental evaluation on probabilistic TPC-H data shows substantial efficiency improvements when compared to the state of the art.