An Empirical Study of a New Approach to Nearest Neighbor Searching

Authors:
Songrit Maneewongvatana;David M. Mount
Affiliations:
-;-
Venue:
ALENEX '01 Revised Papers from the Third International Workshop on Algorithm Engineering and Experimentation
Year:
2001

Citing 26
Cited 1

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Vector quantization and signal compression

Vector quantization and signal compression
A Weighted Nearest Neighbor Algorithm for Learning with Symbolic Features

Machine Learning
Approximate closest-point queries in high dimensions

Information Processing Letters
An algorithm for approximate closest-point queries

SCG '94 Proceedings of the tenth annual symposium on Computational geometry
Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension

PODS '94 Proceedings of the thirteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
The nature of statistical learning theory

The nature of statistical learning theory
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
Computational geometry: algorithms and applications

Computational geometry: algorithms and applications
Approximate nearest neighbor queries revisited

SCG '97 Proceedings of the thirteenth annual symposium on Computational geometry
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
Approximate nearest neighbor queries in fixed dimensions

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Balanced aspect ratio trees: combining the advantages of k-d trees and octrees

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
An optimal algorithm for approximate nearest neighbor searching

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
A Tutorial on Support Vector Machines for Pattern Recognition

Data Mining and Knowledge Discovery
The TV-tree: an index structure for high-dimensional data

The VLDB Journal — The International Journal on Very Large Data Bases - Spatial Database Systems
Query by Image and Video Content: The QBIC System

Computer
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Using the Distance Distribution for Approximate Similarity Queries in High-Dimensional Metric Spaces

DEXA '99 Proceedings of the 10th International Workshop on Database & Expert Systems Applications

A Fast Nearest Neighbor Method Using Empirical Marginal Distribution

KES '09 Proceedings of the 13th International Conference on Knowledge-Based and Intelligent Information and Engineering Systems: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

In nearest neighbor searching we are given a set of n data points in real d-dimensional space, Rd, and the problem is to preprocess these points into a data structure, so that given a query point, the nearest data point to the query point can be reported efficiently. Because data sets can be quite large, we are interested in data structures that use optimal O(dn) storage. In this paper we consider a novel approach to nearest neighbor searching, in which the search returns the correct nearest neighbor with a given probability assuming that the queries are drawn from some known distribution. The query distribution is represented by providing a set of training query points at preprocessing time. The data structure, called the overlapped split tree, is an augmented BSP-tree in which each node is associated with a cover region, which is used to determine whether the search should visit this node. We use principal component analysis and support vector machines to analyze the structure of the data and training points in order to better adapt the tree structure to the data sets. We show empirically that this new approach provides improved predictability over the kd-tree in average query performance.