A top-k filter for logic-based similarity conditions on probabilistic databases
ADBIS'12 Proceedings of the 16th East European conference on Advances in Databases and Information Systems
Anytime approximation in probabilistic databases
The VLDB Journal — The International Journal on Very Large Data Bases
Hi-index | 0.00 |
In many applications of probabilistic databases, the probabilities are mere degrees of uncertainty in the data and are not otherwise meaningful to the user. Often, users care only about the ranking of answers in decreasing order of their probabilities or about a few most likely answers. In this paper, we investigate the problem of ranking query answers in probabilistic databases. We give a dichotomy for ranking in case of conjunctive queries without repeating relation symbols: it is either in polynomial time or \#P-hard. Surprisingly, our syntactic characterisation of tractable queries is not the same as for probability computation. The key observation is that there are queries for which probability computation is \#P-hard, yet ranking can be computed in polynomial time. This is possible whenever probability computation for distinct answers has a common factor that is hard to compute but irrelevant for ranking. We complement this tractability analysis with an effective ranking technique for conjunctive queries. Given a query, we construct a share plan, which exposes sub queries whose probability computation can be shared or ignored across query answers. Our technique combines share plans with incremental approximate probability computation of sub queries. We implemented our technique in the SPROUT query engine and report on performance gains of orders of magnitude over Monte Carlo simulation using FPRAS and exact probability computation based on knowledge compilation.