Acceleration of DBSCAN-based clustering with reduced neighborhood evaluations

  • Authors:
  • Andreas Thom;Oliver Kramer

  • Affiliations:
  • Department of Computer Science, Technische Universität Dortmund, Dortmund, Germany;International Computer Science Institute, Berkeley, CA

  • Venue:
  • KI'10 Proceedings of the 33rd annual German conference on Advances in artificial intelligence
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

DBSCAN is a density-based clustering technique, well appropriate to discover clusters of arbitrary shape, and to handle noise. The number of clusters does not have to be known in advance. Its performance is limited by calculating the ε-neighborhood of each point of the data set. Besides methods that reduce the query complexity of nearest neighbor search, other approaches concentrate on the reduction of necessary ε-neighborhood evaluations. In this paper we propose a heuristic that selects a reduced number of points for the nearest neighborhood search, and uses efficient data structures and algorithms to reduce the runtime significantly. Unlike previous approaches, the number of necessary evaluations is independent of the data space dimensionality. We evaluate the performance of the new approach experimentally on artificial test cases and problems from the UCI machine learning repository.