Curse of Dimensionality in Pivot Based Indexes

Authors:
Ilya Volnyansky;Vladimir Pestov
Affiliations:
-;-
Venue:
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Year:
2009

Citing 15
Cited 5

Asymptotic theory of finite dimensional normed spaces

Asymptotic theory of finite dimensional normed spaces
Learnability and the Vapnik-Chervonenkis dimension

Journal of the ACM (JACM)
Two algorithms for nearest-neighbor search in high dimensions

STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
A cost model for similarity queries in metric spaces

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the geometry of similarity search: dimensionality curse and concentration of measure

Information Processing Letters
Searching in metric spaces

ACM Computing Surveys (CSUR)
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A few notes on statistical learning theory

Advanced lectures on machine learning
Pivot selection techniques for proximity searching in metric spaces

Pattern Recognition Letters
A compact space decomposition for effective metric indexing

Pattern Recognition Letters
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Theory of nearest neighbors indexability

ACM Transactions on Database Systems (TODS)
2008 Special Issue: An axiomatic approach to intrinsic dimension of a dataset

Neural Networks
Learning and Generalization: With Applications to Neural Networks

Learning and Generalization: With Applications to Neural Networks

Analyzing Metric Space Indexes: What For?

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Indexability, concentration, and VC theory

Proceedings of the Third International Conference on SImilarity Search and APplications
Intrinsic dimensionality

SIGSPATIAL Special
Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Indexability, concentration, and VC theory

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples picked in i.i.d. fashion from a sequence of metric spaces. We allow the size of the dataset to grow in relation to dimension, such that the dimension is superlogarithmic but subpolynomial in the size of the dataset. The number of pivots is sublinear in the size of the dataset. We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces in the sense of concentration of measure phenomenon is linear in dimension, then the performance of similarity search pivot-based indexes is asymptotically linear in the size of the dataset.