A novel probabilistic pruning approach to speed up similarity queries in uncertain databases

Authors:
Thomas Bernecker;Tobias Emrich;Hans-Peter Kriegel;Nikos Mamoulis;Matthias Renz;Andreas Zufle
Affiliations:
Department of Computer Science, Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80539 Munich, Germany;Department of Computer Science, Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80539 Munich, Germany;Department of Computer Science, Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80539 Munich, Germany;Department of Computer Science, University of Hong Kong, Pokfulam Road, Hong Kong;Department of Computer Science, Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80539 Munich, Germany;Department of Computer Science, Ludwig-Maximilians-Universität München, Oettingenstr. 67, 80539 Munich, Germany
Venue:
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Year:
2011

Citing 0
Cited 4

Efficient probabilistic reverse nearest neighbor query processing on uncertain data

Proceedings of the VLDB Endowment
Continuous inverse ranking queries in uncertain streams

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Efficiently processing snapshot and continuous reverse k nearest neighbors queries

The VLDB Journal — The International Journal on Very Large Data Bases
Nearest neighbor searching under uncertainty II

Proceedings of the 32nd symposium on Principles of database systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we propose a novel, effective and efficient probabilistic pruning criterion for probabilistic similarity queries on uncertain data. Our approach supports a general uncertainty model using continuous probabilistic density functions to describe the (possibly correlated) uncertain attributes of objects. In a nutshell, the problem to be solved is to compute the PDF of the random variable denoted by the probabilistic domination count: Given an uncertain database object B, an uncertain reference object R and a set D of uncertain database objects in a multi-dimensional space, the probabilistic domination count denotes the number of uncertain objects in D that are closer to R than B. This domination count can be used to answer a wide range of probabilistic similarity queries. Specifically, we propose a novel geometric pruning filter and introduce an iterative filter-refinement strategy for conservatively and progressively estimating the probabilistic domination count in an efficient way while keeping correctness according to the possible world semantics. In an experimental evaluation, we show that our proposed technique allows to acquire tight probability bounds for the probabilistic domination count quickly, even for large uncertain databases.