Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions

Authors:
Vladimir Pestov
Affiliations:
Universidade Federal de Santa Catarina, Florianópolis-SC, Brasil and University of Ottawa, Ontario, Canada
Venue:
Proceedings of the Fourth International Conference on SImilarity Search and APplications
Year:
2011

Citing 35
Cited 1

Asymptotic theory of finite dimensional normed spaces

Asymptotic theory of finite dimensional normed spaces
Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers

Machine Learning - Special issue on COLT '93
The nature of statistical learning theory

The nature of statistical learning theory
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for similarity queries in metric spaces

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Lower bounds for high dimensional nearest neighbor search and related problems

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Tighter bounds for nearest neighbor search and related problems in the cell probe model

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
On the geometry of similarity search: dimensionality curse and concentration of measure

Information Processing Letters
Exploratory image databases: content-based retrieval

Exploratory image databases: content-based retrieval
Searching in metric spaces

ACM Computing Surveys (CSUR)
On a model of indexability and its bounds for range queries

Journal of the ACM (JACM)
Learning in Neural Networks: Theoretical Foundations

Learning in Neural Networks: Theoretical Foundations
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces

SIAM Journal on Computing
Fast Nearest-Neighbor Search in Dissimilarity Spaces

IEEE Transactions on Pattern Analysis and Machine Intelligence
Similarity Indexing with the SS-tree

ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces

VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A few notes on statistical learning theory

Advanced lectures on machine learning
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)

Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
A compact space decomposition for effective metric indexing

Pattern Recognition Letters
Similarity Search: The Metric Space Approach (Advances in Database Systems)

Similarity Search: The Metric Space Approach (Advances in Database Systems)
Indexing schemes for similarity search: an illustrated paradigm

Fundamenta Informaticae
Theory of nearest neighbors indexability

ACM Transactions on Database Systems (TODS)
On the Optimality of the Dimensionality Reduction Method

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Higher Lower Bounds for Near-Neighbor and Further Rich Problems

FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Indexing schemes for similarity search in datasets of short protein fragments

Information Systems
2008 Special Issue: An axiomatic approach to intrinsic dimension of a dataset

Neural Networks
A Geometric Approach to Lower Bounds for Approximate Near-Neighbor Search and Partial Match

FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
Analyzing Metric Space Indexes: What For?

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Curse of Dimensionality in Pivot Based Indexes

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Learning and Generalization: With Applications to Neural Networks

Learning and Generalization: With Applications to Neural Networks
Indexability, concentration, and VC theory

Journal of Discrete Algorithms

Indexability, concentration, and VC theory

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

Within a mathematically rigorous model borrowed from statistical learning theory, we analyse the curse of dimensionality for similarity based information retrieval in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain Ω, equipped with a distance, ρ, and an underlying probability distribution, μ. While performing an asymptotic analysis, we send the intrinsic dimension d of Ω to infinity, and assume that the size of a dataset, n, grows superpolynomially yet subexponentially in d. Exact similarity search refers to finding the nearest neighbour in the dataset X to a query point ω ∈ Ω, where the query points are subject to the same probability distribution μ as datapoints. Let F denote a class of all 1-Lipschitz functions on Ω that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of all sets {ω: ƒ(ω) ≥ a}, a ∈ R is dO(1). (In view of a 1995 result of Goldberg and Jerrum, this is a reasonable complexity assumption.) We deduce superpolynomial in d lower bounds on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in (Ω, X).