An improved algorithm finding nearest neighbor using Kd-trees

Authors:
Rina Panigrahy
Affiliations:
Microsoft Research, Mountain View, CA
Venue:
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Year:
2008

Citing 21
Cited 7

Automatic text processing

Automatic text processing
Point location in arrangements of hyperplanes

Information and Computation
Non-expansive hashing

STOC '96 Proceedings of the twenty-eighth annual ACM symposium on Theory of computing
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Nearest neighbor queries in metric spaces

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Locality-preserving hashing in multidimensional spaces

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Fuzzy queries in multimedia database systems

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Lower bounds for high dimensional nearest neighbor search and related problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Approximate nearest neighbor algorithms for Frechet distance via product metrics

Proceedings of the eighteenth annual symposium on Computational geometry
Information Retrieval

Information Retrieval
Query by Image and Video Content: The QBIC System

Computer
Cell-probe lower bounds for the partial match problem

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
A Replacement for Voronoi Diagrams of Near Linear Size

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Locality-sensitive hashing scheme based on p-stable distributions

SCG '04 Proceedings of the twentieth annual symposium on Computational geometry
Entropy based nearest neighbor search in high dimensions

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Lower bounds on locality sensitive hashing

Proceedings of the twenty-second annual symposium on Computational geometry
Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science

Automated flexion crease identification using internal image seams

Pattern Recognition
Fast k most similar neighbor classifier for mixed data (tree k-MSN)

Pattern Recognition
Autonomous indoor helicopter flight using a single onboard camera

IROS'09 Proceedings of the 2009 IEEE/RSJ international conference on Intelligent robots and systems
Another variant of robust fuzzy PCA with initial membership estimation

ACIIDS'11 Proceedings of the Third international conference on Intelligent information and database systems - Volume Part II
Ask me better questions: active learning queries based on rule induction

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive particle tracing in time-varying tetrahedral grids

EG PGV'11 Proceedings of the 11th Eurographics conference on Parallel Graphics and Visualization
Active learning from relative queries

IJCAI'13 Proceedings of the Twenty-Third international joint conference on Artificial Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

We suggest a simple modification to the Kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than two. Since the exact nearest neighbor search problem suffers from the curse of dimensionality we focus on approximate solutions; a c-approximate nearest neighbor is any neighbor within distance at most c times the distance to the nearest neighbor. We show that for a randomly constructed database of points if the query point is chosen close to one of the points in the data base, the traditional Kd-tree search algorithm has a very low probability of finding an approximate nearest neighbor; the probability of success drops exponentially in the number of dimensions d as e-Ω(d/c). However, a simple change to the search algorithm results in a much higher chance of success. Instead of searching for the query point in the Kd-tree we search for a random set of points in the neighborhood of the query point. It turns out that searching for eΩ(d/c) such points can find the c-approximate nearest neighbor with a much higher chance of success.