H-Tree: a hybrid structure for confidence computation in probabilistic databases

Authors:
Qian Zhang;Biao Qin;Shan Wang
Affiliations:
Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China, Ministry of Education, Beijing, China and School of Information, Renmin University of China, Beijing, Chin ...;Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China, Ministry of Education, Beijing, China and School of Information, Renmin University of China, Beijing, Chin ...;Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China, Ministry of Education, Beijing, China and School of Information, Renmin University of China, Beijing, Chin ...
Venue:
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
Year:
2012

Citing 15
Cited 0

MYSTIQ: a system for finding more answers by using probabilities

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
ULDBs: databases with uncertainty and lineage

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
The dichotomy of conjunctive queries on probabilistic structures

Proceedings of the twenty-sixth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Efficient query evaluation on probabilistic databases

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Materialized views in probabilistic databases: for information exchange and query optimization

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
MCDB: a monte carlo approach to managing uncertain data

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Approximating predicates and expressive queries on probabilistic databases

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Fast and Simple Relational Processing of Uncertain Data

ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
SPROUT: Lazy vs. Eager Query Plans for Tuple-Independent Probabilistic Databases

ICDE '09 Proceedings of the 2009 IEEE International Conference on Data Engineering
PrDB: managing and exploiting rich correlations in probabilistic databases

The VLDB Journal — The International Journal on Very Large Data Bases
Factoring and recognition of read-once functions using cographs and normality and the readability of functions associated with partial k-trees

Discrete Applied Mathematics
Read-once functions and query evaluation in probabilistic databases

Proceedings of the VLDB Endowment
On the optimal approximation of queries using tractable propositional languages

Proceedings of the 14th International Conference on Database Theory
Faster query answering in probabilistic databases using read-once functions

Proceedings of the 14th International Conference on Database Theory
Sensitivity analysis and explanations for robust query evaluation in probabilistic databases

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

Probabilistic database has become a popular tool for uncertain data management. Most work in the area is focused on efficient query processing and has two main directions, accurate or approximate evaluation. In recent work for conjunctive query without self-joins on a tuple-independent probabilistic database, query evaluation is equivalent to computing marginal probabilities of boolean formulas associated with query results. If formulas can be factorized into a read-once form where every variable appears at most once, confidence computation is reduced to a tractable problem that can be evaluated in linear time. Otherwise, it is regarded as a NP-hard problem and need to be evaluated approximately. In this paper, we propose a framework that evaluates both tractable and NP-hard conjunctive queries efficiently. First, we develop a novel structure H-tree, where boolean formulas are decomposed to small partitions which are either read-once or NP-hard. Then we propose algorithms for building H-tree and parallelizing (approximate) confidence computation. We also propose fundamental theorems to ensure the correctness of our approaches. Performance experiments demonstrate the benefits of H-tree, especially for approximate confidence evaluation on NP-hard queries.