Nearest Neighbors Can Be Found Efficiently If the Dimension Is Small Relative to the Input Size

Authors:
Michiel Hagedoorn
Affiliations:
-
Venue:
ICDT '03 Proceedings of the 9th International Conference on Database Theory
Year:
2003

Citing 18
Cited 3

A randomized algorithm for closest-point queries

SIAM Journal on Computing
Applications of spatial data structures: Computer graphics, image processing, and GIS

Applications of spatial data structures: Computer graphics, image processing, and GIS
Point location in arrangements of hyperplanes

Information and Computation
Randomized algorithms

Randomized algorithms
FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets

SIGMOD '95 Proceedings of the 1995 ACM SIGMOD international conference on Management of data
Photobook: content-based manipulation of image databases

International Journal of Computer Vision
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Efficient search for approximate nearest neighbor in high dimensional spaces

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
Exact L∞ nearest neighbor search in high dimensions

SCG '01 Proceedings of the seventeenth annual symposium on Computational geometry
Closest-point problems simplified on the RAM

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Nearest Neighbor Search in Medical Image Databases

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Efficient Image Retrieval through Vantage Objects

VISUAL '99 Proceedings of the Third International Conference on Visual Information and Information Systems
Deflating the Dimensionality Curse Using Multiple Fractal Dimensions

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A Replacement for Voronoi Diagrams of Near Linear Size

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)

Array-index: a plug&search K nearest neighbors method for high-dimensional data

Data & Knowledge Engineering
Approximate voronoi cell computation on spatial data streams

The VLDB Journal — The International Journal on Very Large Data Bases
VoR-tree: R-trees with Voronoi diagrams for efficient processing of spatial nearest neighbor queries

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of nearest-neighbor search for a set of n data points in d-dimensional Euclidean space. We propose a simple, practical data structure, which is basically a directed acyclic graph in which each node has at most two outgoing arcs. We analyze the performance of this data structure for the setting in which the n data points are chosen independently from a d-dimensional ball under the uniform distribution. In the average case, for fixed dimension d, we achieve a query time of O(log2 n) using only O(n) storage space. For variable dimension, both the query time and the storage space are multiplied with a dimension-dependent factor that is at most exponential in d. This is an improvement over previously known time-space tradeoffs, which all have a super-exponential factor of at least d驴 (d) either in the query time or in the storage space. Our data structure can be stored efficiently in secondary memory: In a standard secondary-memory model, for fixed dimension d, we achieve average-case bounds of O((log2 n)/B + log n) query time and O(N) storage space, where B is the block-size parameter and N = n/B. Our data structure is not limited to Euclidean space; its definition generalizes to all possible choices of query objects, data objects, and distance functions.