H-Tree: a hybrid structure for confidence computation in probabilistic databases

  • Authors:
  • Qian Zhang;Biao Qin;Shan Wang

  • Affiliations:
  • Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China, Ministry of Education, Beijing, China and School of Information, Renmin University of China, Beijing, Chin ...;Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China, Ministry of Education, Beijing, China and School of Information, Renmin University of China, Beijing, Chin ...;Key Laboratory of Data Engineering and Knowledge Engineering, Renmin University of China, Ministry of Education, Beijing, China and School of Information, Renmin University of China, Beijing, Chin ...

  • Venue:
  • APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Probabilistic database has become a popular tool for uncertain data management. Most work in the area is focused on efficient query processing and has two main directions, accurate or approximate evaluation. In recent work for conjunctive query without self-joins on a tuple-independent probabilistic database, query evaluation is equivalent to computing marginal probabilities of boolean formulas associated with query results. If formulas can be factorized into a read-once form where every variable appears at most once, confidence computation is reduced to a tractable problem that can be evaluated in linear time. Otherwise, it is regarded as a NP-hard problem and need to be evaluated approximately. In this paper, we propose a framework that evaluates both tractable and NP-hard conjunctive queries efficiently. First, we develop a novel structure H-tree, where boolean formulas are decomposed to small partitions which are either read-once or NP-hard. Then we propose algorithms for building H-tree and parallelizing (approximate) confidence computation. We also propose fundamental theorems to ensure the correctness of our approaches. Performance experiments demonstrate the benefits of H-tree, especially for approximate confidence evaluation on NP-hard queries.