New instability results for high-dimensional nearest neighbor search

Authors:
Chris Giannella
Affiliations:
The MITRE Corporation, Hanover, MD, USA
Venue:
Information Processing Letters
Year:
2009

Citing 4
Cited 2

On the geometry of similarity search: dimensionality curse and concentration of measure

Information Processing Letters
Theory of nearest neighbors indexability

ACM Transactions on Database Systems (TODS)
The Concentration of Fractional Distances

IEEE Transactions on Knowledge and Data Engineering
On the Design and Applicability of Distance Functions in High-Dimensional Data Space

IEEE Transactions on Knowledge and Data Engineering

On the distance concentration awareness of certain data reduction techniques

Pattern Recognition
Non-parametric detection of meaningless distances in high dimensional data

Statistics and Computing

Quantified Score

Hi-index	0.89

Visualization

Abstract

Consider a dataset of n(d) points generated independently from R^d according to a common p.d.f. f"d with support(f"d)=[0,1]^d and sup{f"d(R^d)} growing sub-exponentially in d. We prove that: (i) if n(d) grows sub-exponentially in d, then, for any query point q-@?[0,1]^d and any @e0, the ratio of the distance between any two dataset points and q- is less that 1+@e with probability -1 as d-~; (ii) if n(d)[4(1+@e)]^d for large d, then for all q-@?[0,1]^d (except a small subset) and any @e0, the distance ratio is less than 1+@e with limiting probability strictly bounded away from one. Moreover, we provide preliminary results along the lines of (i) when f"d=N(@m-"d,@S"d).