Approximate high-dimensional nearest neighbor queries using R-forests

  • Authors:
  • Michael Nolen;King-Ip Lin

  • Affiliations:
  • The University of Memphis, Memphis, TN;The University of Memphis, Memphis, TN

  • Venue:
  • Proceedings of the 17th International Database Engineering & Applications Symposium
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Highly efficient query processing on high-dimensional data, while important, is still a challenge nowadays -- as the curse of dimensionality makes efficient solution very difficult. On the other hand, there have been suggestions that it is better off if one can return a solution quickly, that is close enough, to be sufficient. In this paper we will introduce the concept R-Forest, comprised of a set of disjoint R-trees built over the domain of the search space. Each R-tree will store a sub-set of points in a non-overlapping space, which is maintained throughout the life of the forest. Also included are several new features, Median point used for ordering and searching a pruning parameter, as well as restricted access. When all of these are combined together they can be used to answer Approximate Nearest Neighbor queries, returning a result that is an improvement over alternative methods, such as Locality Sensitive Hashing B-Tree (LSB-tree) with the same amount of IO. With our approach to this difficult problem, we are able to handle different data distribution, even taking advantage of the distribution without any additional parameter tuning, scales with increasing dimensionality and most importantly provides the user with some feedback, in terms of lower bound as to the quality of the results.