Probably correct k-nearest neighbor search in high dimensions

Authors:
Jun Toyama;Mineichi Kudo;Hideyuki Imai
Affiliations:
Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan;Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan;Division of Computer Science, Graduate School of Information Science and Technology, Hokkaido University, Sapporo 060-0814, Japan
Venue:
Pattern Recognition
Year:
2010

Citing 24
Cited 4

Introduction to statistical pattern recognition (2nd ed.)

Introduction to statistical pattern recognition (2nd ed.)
A hashing-oriented nearest neighbor searching scheme

Pattern Recognition Letters
A Fast Algorithm for the Nearest-Neighbor Classifier

IEEE Transactions on Pattern Analysis and Machine Intelligence
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
A Simple Algorithm for Nearest Neighbor Search in High Dimensions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
An optimal algorithm for approximate nearest neighbor searching fixed dimensions

Journal of the ACM (JACM)
An Algorithm for Finding Best Matches in Logarithmic Expected Time

ACM Transactions on Mathematical Software (TOMS)
Optimal Expected-Time Algorithms for Closest Point Problems

ACM Transactions on Mathematical Software (TOMS)
Multidimensional binary search trees used for associative searching

Communications of the ACM
A Fast Nearest-Neighbor Algorithm Based on a Principal Axis Search Tree

IEEE Transactions on Pattern Analysis and Machine Intelligence
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Model-Based Object Recognition by Geometric Hashing

ECCV '90 Proceedings of the First European Conference on Computer Vision
Fast Nearest Neighbor Search in High-Dimensional Space

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
The X-tree: An Index Structure for High-Dimensional Data

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
PAC Nearest Neighbor Queries: Approximate and Controlled Search in High-Dimensional and Metric Spaces

ICDE '00 Proceedings of the 16th International Conference on Data Engineering
A Branch and Bound Algorithm for Computing k-Nearest Neighbors

IEEE Transactions on Computers
An Algorithm for Finding Nearest Neighbors

IEEE Transactions on Computers
Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Principal Component Analysis Based on L1-Norm Maximization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nearest neighbor pattern classification

IEEE Transactions on Information Theory
The condensed nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory
The reduced nearest neighbor rule (Corresp.)

IEEE Transactions on Information Theory

Development of a hashing-based data structure for the fast retrieval of 3D terrestrial laser scanned data

Computers & Geosciences
Noisy data elimination using mutual k-nearest neighbor for classification mining

Journal of Systems and Software
Predicting noise filtering efficacy with data complexity measures for nearest neighbor classification

Pattern Recognition
Coarse to fine K nearest neighbor classifier

Pattern Recognition Letters

Quantified Score

Hi-index	0.01

Visualization

Abstract

A novel approach for k-nearest neighbor (k-NN) searching with Euclidean metric is described. It is well known that many sophisticated algorithms cannot beat the brute-force algorithm when the dimensionality is high. In this study, a probably correct approach, in which the correct set of k-nearest neighbors is obtained in high probability, is proposed for greatly reducing the searching time. We exploit the marginal distribution of the k th nearest neighbors in low dimensions, which is estimated from the stored data (an empirical percentile approach). We analyze the basic nature of the marginal distribution and show the advantage of the implemented algorithm, which is a probabilistic variant of the partial distance searching. Its query time is sublinear in data size n, that is, O(mn@d) with @d=o(1) in n and @d@?1, for any fixed dimension m.