Indexing schemes for similarity search: an illustrated paradigm

Authors:
Vladimir Pestov;Aleksandar Stojmirović
Affiliations:
Department of Mathematics and Statistics, University of Ottawa, Ontario, Canada;Department of Mathematics and Statistics, University of Ottawa, Ontario, Canada
Venue:
Fundamenta Informaticae
Year:
2005

Citing 13
Cited 5

Database metatheory: asking the big queries

PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the analysis of indexing schemes

PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A cost model for similarity queries in metric spaces

PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality

STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
On the geometry of similarity search: dimensionality curse and concentration of measure

Information Processing Letters
The "DGX" distribution for mining massive, skewed data

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Searching in metric spaces

ACM Computing Surveys (CSUR)
On a model of indexability and its bounds for range queries

Journal of the ACM (JACM)
Lectures on Discrete Geometry

Lectures on Discrete Geometry
A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces

VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A Geometric Framework for Modelling Similarity Search

DEXA '99 Proceedings of the 10th International Workshop on Database & Expert Systems Applications

Indexing schemes for similarity search in datasets of short protein fragments

Information Systems
A flexible framework to ease nearest neighbor search in multidimensional data spaces

Data & Knowledge Engineering
Indexability, concentration, and VC theory

Proceedings of the Third International Conference on SImilarity Search and APplications
Lower bounds on performance of metric tree indexing schemes for exact similarity search in high dimensions

Proceedings of the Fourth International Conference on SImilarity Search and APplications
Indexability, concentration, and VC theory

Journal of Discrete Algorithms

Quantified Score

Hi-index	0.00

Visualization

Abstract

We suggest a variation of the Hellerstein--Koutsoupias--Papadimitriou indexability model for datasets equipped with a similarity measure, with the aim of better understanding the structure of indexing schemes for similarity-based search and the geometry of similarity workloads. This in particular provides a unified approach to a great variety of schemes used to index into metric spaces and facilitates their transfer to more general similarity measures such as quasi-metrics. We discuss links between performance of indexing schemes and high-dimensional geometry. The concepts and results are illustrated on a very large concrete dataset of peptide fragments equipped with a biologically significant similarity measure.