An Empirical Study of a New Approach to Nearest Neighbor Searching

  • Authors:
  • Songrit Maneewongvatana;David M. Mount

  • Affiliations:
  • -;-

  • Venue:
  • ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
  • Year:
  • 2001

Quantified Score

Hi-index 0.00

Visualization

Abstract

In nearest neighbor searching we are given a set of n data points in real d-dimensional space, Rd, and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported efficiently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage. In this paper we consider a novel approach to nearest neighbor searching, in which the search returns the correct nearest neighbor with a given probability assuming that the queries are drawn from some known distribution. The query distribution is represented by providing a set of training query points at preprocessing time. The data structure, called the overlapped split tree, is an augmented BSP-tree in which each node is associated with a cover region, which is used to determine whether the search should visit this node. We use principal component analysis and support vector machines to analyze the structure of the data and training points in order to better adapt the tree structure to the data sets. We show empirically that this new approach provides improved predictability over the kd-tree in average query performance.