Accelerating High-Dimensional Nearest Neighbor Queries

  • Authors:
  • Christian A. Lang;Ambuj K. Singh

  • Affiliations:
  • -;-

  • Venue:
  • SSDBM '02 Proceedings of the 14th International Conference on Scientific and Statistical Database Management
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of nearest neighbor (NN) queries degrades noticeably with increasing dimensionality of the data due to reduced selectivity of high-dimensional data and an increased number of seek operations during NN-query execution.If the NN-radii would be known in advance, the disk accesses could be reordered such that seek operations are minimized. We therefore propose a new way of estimating the NN-radius based on the fractal dimensionality and sampling. It is applicable to any page-basedindex structure. We show that the estimation error is considerably lower than for previous approaches.In the second part of the paper, we present two applications of this technique. We show how the radius estimations can be used totransform k-NN queries into at most two range queries, and how it can be used to reduce the number of page reads during all-NN queries.In both cases, we observe significant speedups over traditional techniques for synthetic and real-world data.