Can shared-neighbor distances defeat the curse of dimensionality?

  • Authors:
  • Michael E. Houle;Hans-Peter Kriegel;Peer Kröger;Erich Schubert;Arthur Zimek

  • Affiliations:
  • National Institute of Informatics, Tokyo, Japan;Ludwig-Maximilians-Universität München, München, Germany;Ludwig-Maximilians-Universität München, München, Germany;Ludwig-Maximilians-Universität München, München, Germany;Ludwig-Maximilians-Universität München, München, Germany

  • Venue:
  • SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of similarity measures for search, indexing, and data mining applications tends to degrade rapidly as the dimensionality of the data increases. The effects of the so-called 'curse of dimensionality' have been studied by researchers for data sets generated according to a single data distribution. In this paper, we study the effects of this phenomenon on different similarity measures for multiply-distributed data. In particular, we assess the performance of shared-neighbor similarity measures, which are secondary similarity measures based on the rankings of data objects induced by some primary distance measure. We find that rank-based similarity measures can result in more stable performance than their associated primary distance measures.