Measuring the difficulty of distance-based indexing

  • Authors:
  • Matthew Skala

  • Affiliations:
  • University of Waterloo, Waterloo, Ontario, Canada

  • Venue:
  • SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data structures for similarity search are commonly evaluated on data in vector spaces, but distance-based data structures are also applicable to non-vector spaces with no natural concept of dimensionality. The intrinsic dimensionality statistic of Chávez and Navarro provides a way to compare the performance of similarity indexing and search algorithms across different spaces, and predict the performance of index data structures on non-vector spaces by relating them to equivalent vector spaces. We characterise its asymptotic behaviour, and give experimental results to calibrate these comparisons.