Top-k query processing in probabilistic databases with non-materialized views

  • Authors:
  • Maximilian Dylla;Martin Theobald;Iris Miliaraki

  • Affiliations:
  • Max Planck Institute for Informatics, Campus E1.4, 66123 Saarbrücken, Germany;University of Antwerp Middelheimlaan 1, 2020 Antwerp, Belgium;Max Planck Institute for Informatics, Campus E1.4, 66123 Saarbrücken, Germany

  • Venue:
  • ICDE '13 Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

We investigate a novel approach of computing confidence bounds for top-k ranking queries in probabilistic databases with non-materialized views. Unlike related approaches, we present an exact pruning algorithm for finding the top-ranked query answers according to their marginal probabilities without the need to first materialize all answer candidates via the views. Specifically, we consider conjunctive queries over multiple levels of select-project-join views, the latter of which are cast into Datalog rules which we ground in a top-down fashion directly at query processing time. To our knowledge, this work is the first to address integrated data and confidence computations for intensional query evaluations in the context of probabilistic databases by considering confidence bounds over first-order lineage formulas. We extend our query processing techniques by a tool-suite of scheduling strategies based on selectivity estimation and the expected impact on confidence bounds. Further extensions to our query processing strategies include improved top-k bounds in the case when sorted relations are available as input, as well as the consideration of recursive rules. Experiments with large datasets demonstrate significant runtime improvements of our approach compared to both exact and sampling-based top-k methods over probabilistic data.