Database metatheory: asking the big queries
PODS '95 Proceedings of the fourteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
On the analysis of indexing schemes
PODS '97 Proceedings of the sixteenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
A cost model for similarity queries in metric spaces
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Approximate nearest neighbors: towards removing the curse of dimensionality
STOC '98 Proceedings of the thirtieth annual ACM symposium on Theory of computing
Data structures and algorithms for nearest neighbor search in general metric spaces
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
On the geometry of similarity search: dimensionality curse and concentration of measure
Information Processing Letters
The "DGX" distribution for mining massive, skewed data
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
ACM Computing Surveys (CSUR)
On a model of indexability and its bounds for range queries
Journal of the ACM (JACM)
Lectures on Discrete Geometry
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Near Neighbor Search in Large Metric Spaces
VLDB '95 Proceedings of the 21th International Conference on Very Large Data Bases
A Geometric Framework for Modelling Similarity Search
DEXA '99 Proceedings of the 10th International Workshop on Database & Expert Systems Applications
Indexing schemes for similarity search in datasets of short protein fragments
Information Systems
A flexible framework to ease nearest neighbor search in multidimensional data spaces
Data & Knowledge Engineering
Indexability, concentration, and VC theory
Proceedings of the Third International Conference on SImilarity Search and APplications
Proceedings of the Fourth International Conference on SImilarity Search and APplications
Indexability, concentration, and VC theory
Journal of Discrete Algorithms
Hi-index | 0.00 |
We suggest a variation of the Hellerstein--Koutsoupias--Papadimitriou indexability model for datasets equipped with a similarity measure, with the aim of better understanding the structure of indexing schemes for similarity-based search and the geometry of similarity workloads. This in particular provides a unified approach to a great variety of schemes used to index into metric spaces and facilitates their transfer to more general similarity measures such as quasi-metrics. We discuss links between performance of indexing schemes and high-dimensional geometry. The concepts and results are illustrated on a very large concrete dataset of peptide fragments equipped with a biologically significant similarity measure.