Measuring the difficulty of distance-based indexing

Authors:
Matthew Skala
Affiliations:
University of Waterloo, Waterloo, Ontario, Canada
Venue:
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Year:
2005

Citing 9
Cited 4

The R*-tree: an efficient and robust access method for points and rectangles

SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
A new challenge for compression algorithms: genetic sequences

Information Processing and Management: an International Journal - Special issue: data compression
The SR-tree: an index structure for high-dimensional nearest neighbor queries

SIGMOD '97 Proceedings of the 1997 ACM SIGMOD international conference on Management of data
The pyramid-technique: towards breaking the curse of dimensionality

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Data structures and algorithms for nearest neighbor search in general metric spaces

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Indexing large metric spaces for similarity search queries

ACM Transactions on Database Systems (TODS)
R-trees: a dynamic index structure for spatial searching

SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
The R+-Tree: A Dynamic Index for Multi-Dimensional Objects

VLDB '87 Proceedings of the 13th International Conference on Very Large Data Bases
Proximity Matching Using Fixed-Queries Trees

CPM '94 Proceedings of the 5th Annual Symposium on Combinatorial Pattern Matching

The Concentration of Fractional Distances

IEEE Transactions on Knowledge and Data Engineering
Counting distance permutations

Journal of Discrete Algorithms
CoPhIR Image Collection under the Microscope

SISAP '09 Proceedings of the 2009 Second International Workshop on Similarity Search and Applications
Negative selection algorithm based on grid file of the feature space

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data structures for similarity search are commonly evaluated on data in vector spaces, but distance-based data structures are also applicable to non-vector spaces with no natural concept of dimensionality. The intrinsic dimensionality statistic of Chávez and Navarro provides a way to compare the performance of similarity indexing and search algorithms across different spaces, and predict the performance of index data structures on non-vector spaces by relating them to equivalent vector spaces. We characterise its asymptotic behaviour, and give experimental results to calibrate these comparisons.