A Fast Parallel Clustering Algorithm for Large Spatial Databases

  • Authors:
  • Xiaowei Xu;Jochen Jäger;Hans-Peter Kriegel

  • Affiliations:
  • Corporate Technology, Siemens AG, Otto-Hahn-Ring 6, D-81730 München, Germany. Xiaowei.Xu@mchp.siemens.de;Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 München, Germany. jaeger@informatik.uni-muenchen.de;Institute for Computer Science, University of Munich, Oettingenstr. 67, D-80538 München, Germany. kriegel@informatik.uni-muenchen.de

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

The clustering algorithm DBSCAN relies on a density-basednotion of clusters and is designed to discover clusters ofarbitrary shape as well as to distinguish noise. In this paper, wepresent PDBSCAN, a parallel version of this algorithm. We use the ‘shared-nothing’ architecture with multiple computers interconnectedthrough a network. A fundamental component of a shared-nothing systemis its distributed data structure. We introduce the dR*-tree, a distributed spatial index structure in which the data is spread amongmultiple computers and the indexes of the data are replicated onevery computer. We implemented our method using a number ofworkstations connected via Ethernet (10 Mbit). A performanceevaluation shows that PDBSCAN offers nearly linear speedup and hasexcellent scaleup and sizeup behavior.