Asymptotic theory of finite dimensional normed spaces
Asymptotic theory of finite dimensional normed spaces
Learnability and the Vapnik-Chervonenkis dimension
Journal of the ACM (JACM)
Two algorithms for nearest-neighbor search in high dimensions
STOC '97 Proceedings of the twenty-ninth annual ACM symposium on Theory of computing
A cost model for similarity queries in metric spaces
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the geometry of similarity search: dimensionality curse and concentration of measure
Information Processing Letters
ACM Computing Surveys (CSUR)
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A few notes on statistical learning theory
Advanced lectures on machine learning
Pivot selection techniques for proximity searching in metric spaces
Pattern Recognition Letters
A compact space decomposition for effective metric indexing
Pattern Recognition Letters
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Theory of nearest neighbors indexability
ACM Transactions on Database Systems (TODS)
Learning and Generalization: With Applications to Neural Networks
Learning and Generalization: With Applications to Neural Networks
Analyzing Metric Space Indexes: What For?
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Indexability, concentration, and VC theory
Proceedings of the Third International Conference on SImilarity Search and APplications
SIGSPATIAL Special
Proceedings of the Fourth International Conference on SImilarity Search and APplications
Indexability, concentration, and VC theory
Journal of Discrete Algorithms
Hi-index | 0.00 |
We offer a theoretical validation of the curse of dimensionality in the pivot-based indexing of datasets for similarity search, by proving, in the framework of statistical learning, that in high dimensions no pivot-based indexing scheme can essentially outperform the linear scan. A study of the asymptotic performance of pivot-based indexing schemes is performed on a sequence of datasets modeled as samples picked in i.i.d. fashion from a sequence of metric spaces. We allow the size of the dataset to grow in relation to dimension, such that the dimension is superlogarithmic but subpolynomial in the size of the dataset. The number of pivots is sublinear in the size of the dataset. We pick the least restrictive cost model of similarity search where we count each distance calculation as a single computation and disregard the rest. We demonstrate that if the intrinsic dimension of the spaces in the sense of concentration of measure phenomenon is linear in dimension, then the performance of similarity search pivot-based indexes is asymptotically linear in the size of the dataset.