Read-once functions and query evaluation in probabilistic databases

Authors:
Prithviraj Sen;Amol Deshpande;Lise Getoor
Affiliations:
Yahoo! Labs, Bangalore, India;University of Maryland, College Park;University of Maryland, College Park
Venue:
Proceedings of the VLDB Endowment
Year:
2010

Citing 22
Cited 11

Efficient algorithms for combinatorial problems on graphs with bounded, decomposability—a survey

BIT - Ellis Horwood series in artificial intelligence
A probabilistic relational algebra for the integration of information retrieval and database systems

ACM Transactions on Information Systems (TOIS)
The Fanout Structure of Switching Functions

Journal of the ACM (JACM)
Alignment of Trees - An Alternative to Tree Edit

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching
MYSTIQ: a system for finding more answers by using probabilities

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
Management of probabilistic data: foundations and challenges

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
The dichotomy of conjunctive queries on probabilistic structures

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Approximating predicates and expressive queries on probabilistic databases

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Using OBDDs for Efficient Query Evaluation on Probabilistic Databases

SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
A Simple Linear Time LexBFS Cograph Recognition Algorithm

SIAM Journal on Discrete Mathematics
Managing and Mining Uncertain Data

Managing and Mining Uncertain Data
SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
Secondary-storage confidence computation for conjunctive queries with inequalities

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
The trichotomy of HAVING queries on a probabilistic database

The VLDB Journal — The International Journal on Very Large Data Bases
PrDB: managing and exploiting rich correlations in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
A simple linear time algorithm for cograph recognition

Discrete Applied Mathematics - Structural decompositions, width parameters, and graph labelings (DAS 5)
Factoring and recognition of read-once functions using cographs and normality and the readability of functions associated with partial k-trees

Discrete Applied Mathematics
Bridging the gap between intensional and extensional query evaluation in probabilistic databases

Proceedings of the 13th International Conference on Extending Database Technology
Computing query probability with incidence algebras

Proceedings of the twenty-ninth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems

Knowledge compilation meets database theory: compiling queries to decision diagrams

Proceedings of the 14th International Conference on Database Theory
On the optimal approximation of queries using tractable propositional languages

Proceedings of the 14th International Conference on Database Theory
Faster query answering in probabilistic databases using read-once functions

Proceedings of the 14th International Conference on Database Theory
Sensitivity analysis and explanations for robust query evaluation in probabilistic databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Database foundations for scalable RDF processing

RW'11 Proceedings of the 7th international conference on Reasoning web: semantic technologies for the web of data
Local structure and determinism in probabilistic databases

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
H-Tree: a hybrid structure for confidence computation in probabilistic databases

APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
On the tractability of query compilation and bounded treewidth

Proceedings of the 15th International Conference on Database Theory
Oblivious bounds on the probability of boolean functions

ACM Transactions on Database Systems (TODS)
A temporal-probabilistic database model for information extraction

Proceedings of the VLDB Endowment
Anytime approximation in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases

Quantified Score

Hi-index	0.00

Visualization

Abstract

Probabilistic databases hold promise of being a viable means for large-scale uncertainty management, increasingly needed in a number of real world applications domains. However, query evaluation in probabilistic databases remains a computational challenge. Prior work on efficient exact query evaluation in probabilistic databases has largely concentrated on query-centric formulations (e.g., safe plans, hierarchical queries), in that, they only consider characteristics of the query and not the data in the database. It is easy to construct examples where a supposedly hard query run on an appropriate database gives rise to a tractable query evaluation problem. In this paper, we develop efficient query evaluation techniques that leverage characteristics of both the query and the data in the database. We focus on tuple-independent databases where the query evaluation problem is equivalent to computing marginal probabilities of Boolean formulas associated with the result tuples. This latter task is easy if the Boolean formulas can be factorized into a form that has every variable appearing at most once (called read-once). However, a naive approach that directly uses previously developed Boolean formula factorization algorithms is inefficient, because those algorithms require the input formulas to be in the disjunctive normal form (DNF). We instead develop novel, more efficient factorization algorithms that directly construct the read-once expression for a result tuple Boolean formula (if one exists), for a large subclass of queries (specifically, conjunctive queries without self-joins). We empirically demonstrate that (1) our proposed techniques are orders of magnitude faster than generic inference algorithms for queries where the result Boolean formulas can be factorized into read-once expressions, and (2) for the special case of hierarchical queries, they rival the efficiency of prior techniques specifically designed to handle such queries.