Generalizing the optimality of multi-step k-nearest neighbor query processing

  • Authors:
  • Hans-Peter Kriegel;Peer Kröger;Peter Kunath;Matthias Renz

  • Affiliations:
  • Institute for Computer Science, Ludwig-Maximilians Universität München;Institute for Computer Science, Ludwig-Maximilians Universität München;Institute for Computer Science, Ludwig-Maximilians Universität München;Institute for Computer Science, Ludwig-Maximilians Universität München

  • Venue:
  • SSTD'07 Proceedings of the 10th international conference on Advances in spatial and temporal databases
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Similarity search algorithms that directly rely on index structures and require a lot of distance computations are usually not applicable to databases containing complex objects and defining costly distance functions on spatial, temporal and multimedia data. Rather, the use of an adequate multi-step query processing strategy is crucial for the performance of a similarity search routine that deals with complex distance functions. Reducing the number of candidates returned from the filter step which then have to be exactly evaluated in the refinement step is fundamental for the efficiency of the query process. The state-of-the-art multi-step k-nearest neighbor (kNN) search algorithms are designed to use only a lower bounding distance estimation for candidate pruning. However, in many applications, also an upper bounding distance approximation is available that can additionally be used for reducing the number of candidates. In this paper, we generalize the traditional concept of R-optimality and introduce the notion of RI -optimality depending on the distance information I available in the filter step. We propose a new multi-step kNN search algorithm that utilizes lower- and upper bounding distance information (Ilu) in the filter step. Furthermore, we show that, in contrast to existing approaches, our proposed solution is RIlu - optimal. In an experimental evaluation, we demonstrate the significant performance gain over existing methods.