On the geometry of similarity search: dimensionality curse and concentration of measure
Information Processing Letters
Theory of nearest neighbors indexability
ACM Transactions on Database Systems (TODS)
The Concentration of Fractional Distances
IEEE Transactions on Knowledge and Data Engineering
On the Design and Applicability of Distance Functions in High-Dimensional Data Space
IEEE Transactions on Knowledge and Data Engineering
On the distance concentration awareness of certain data reduction techniques
Pattern Recognition
Non-parametric detection of meaningless distances in high dimensional data
Statistics and Computing
Hi-index | 0.89 |
Consider a dataset of n(d) points generated independently from R^d according to a common p.d.f. f"d with support(f"d)=[0,1]^d and sup{f"d(R^d)} growing sub-exponentially in d. We prove that: (i) if n(d) grows sub-exponentially in d, then, for any query point q-@?[0,1]^d and any @e0, the ratio of the distance between any two dataset points and q- is less that 1+@e with probability -1 as d-~; (ii) if n(d)[4(1+@e)]^d for large d, then for all q-@?[0,1]^d (except a small subset) and any @e0, the distance ratio is less than 1+@e with limiting probability strictly bounded away from one. Moreover, we provide preliminary results along the lines of (i) when f"d=N(@m-"d,@S"d).