Asymptotic theory of finite dimensional normed spaces
Asymptotic theory of finite dimensional normed spaces
Bounding the Vapnik-Chervonenkis Dimension of Concept Classes Parameterized by Real Numbers
Machine Learning - Special issue on COLT '93
The nature of statistical learning theory
The nature of statistical learning theory
The SR-tree: an index structure for high-dimensional nearest neighbor queries
SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
A cost model for similarity queries in metric spaces
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Lower bounds for high dimensional nearest neighbor search and related problems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Tighter bounds for nearest neighbor search and related problems in the cell probe model
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
On the geometry of similarity search: dimensionality curse and concentration of measure
Information Processing Letters
Exploratory image databases: content-based retrieval
Exploratory image databases: content-based retrieval
ACM Computing Surveys (CSUR)
On a model of indexability and its bounds for range queries
Journal of the ACM (JACM)
Learning in Neural Networks: Theoretical Foundations
Learning in Neural Networks: Theoretical Foundations
Efficient Search for Approximate Nearest Neighbor in High Dimensional Spaces
SIAM Journal on Computing
Fast Nearest-Neighbor Search in Dissimilarity Spaces
IEEE Transactions on Pattern Analysis and Machine Intelligence
Similarity Indexing with the SS-tree
ICDE '96 Proceedings of the Twelfth International Conference on Data Engineering
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
M-tree: An Efficient Access Method for Similarity Search in Metric Spaces
VLDB '97 Proceedings of the 23rd International Conference on Very Large Data Bases
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
A few notes on statistical learning theory
Advanced lectures on machine learning
Foundations of Multidimensional and Metric Data Structures (The Morgan Kaufmann Series in Computer Graphics and Geometric Modeling)
A compact space decomposition for effective metric indexing
Pattern Recognition Letters
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Similarity Search: The Metric Space Approach (Advances in Database Systems)
Indexing schemes for similarity search: an illustrated paradigm
Fundamenta Informaticae
Theory of nearest neighbors indexability
ACM Transactions on Database Systems (TODS)
On the Optimality of the Dimensionality Reduction Method
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Higher Lower Bounds for Near-Neighbor and Further Rich Problems
FOCS '06 Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science
Indexing schemes for similarity search in datasets of short protein fragments
Information Systems
A Geometric Approach to Lower Bounds for Approximate Near-Neighbor Search and Partial Match
FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
Analyzing Metric Space Indexes: What For?
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Curse of Dimensionality in Pivot Based Indexes
SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Learning and Generalization: With Applications to Neural Networks
Learning and Generalization: With Applications to Neural Networks
Indexability, concentration, and VC theory
Journal of Discrete Algorithms
Indexability, concentration, and VC theory
Journal of Discrete Algorithms
Hi-index | 0.00 |
Within a mathematically rigorous model borrowed from statistical learning theory, we analyse the curse of dimensionality for similarity based information retrieval in the context of popular indexing schemes: metric trees. The datasets X are sampled randomly from a domain Ω, equipped with a distance, ρ, and an underlying probability distribution, μ. While performing an asymptotic analysis, we send the intrinsic dimension d of Ω to infinity, and assume that the size of a dataset, n, grows superpolynomially yet subexponentially in d. Exact similarity search refers to finding the nearest neighbour in the dataset X to a query point ω ∈ Ω, where the query points are subject to the same probability distribution μ as datapoints. Let F denote a class of all 1-Lipschitz functions on Ω that can be used as decision functions in constructing a hierarchical metric tree indexing scheme. Suppose the VC dimension of all sets {ω: ƒ(ω) ≥ a}, a ∈ R is dO(1). (In view of a 1995 result of Goldberg and Jerrum, this is a reasonable complexity assumption.) We deduce superpolynomial in d lower bounds on the expected average case performance of hierarchical metric-tree based indexing schemes for exact similarity search in (Ω, X).