Pivot selection techniques for proximity searching in metric spaces

  • Authors:
  • Benjamin Bustos;Gonzalo Navarro;Edgar Chávez

  • Affiliations:
  • Department of Computer and Information Science, University of Konstanz, Universitaetstrasse 10, Box D 78, 78457 Konstanz, Germany and Center for Web Research, Department of Computer Science, Unive ...;Center for Web Research, Department of Computer Science, University of Chile, Blanco Encalada 2120, Santiago, Chile;Escuela de Ciencias Físico-Matemáticas, Universidad Michoacana, Edificio "B", Ciudad Universitaria, Morelia, Mich., Mexico

  • Venue:
  • Pattern Recognition Letters
  • Year:
  • 2003

Quantified Score

Hi-index 0.10

Visualization

Abstract

With few exceptions, proximity search algorithms in metric spaces based on the use of pivots select them at random among the objects of the metric space. However, it is well known that the way in which the pivots are selected can drastically affect the performance of the algorithm. Between two sets of pivots of the same size, better chosen pivots can largely reduce the search time. Alternatively, a better chosen small set of pivots (requiring much less space) can yield the same efficiency as a larger, randomly chosen, set. We propose an efficiency measure to compare two pivot sets, combined with an optimization technique that allows us to select good sets of pivots. We obtain abundant empirical evidence showing that our technique is effective, and it is the first that we are aware of in producing consistently good results in a wide variety of cases and in being based on a formal theory. We show that good pivots are outliers, but that selecting outliers does not ensure that good pivots are selected.