Fast parameterless density-based clustering via random projections

  • Authors:
  • Johannes Schneider;Michail Vlachos

  • Affiliations:
  • IBM Research - Zurich, Rueschlikon, Switzerland;IBM Research - Zurich, Rueschlikon, Switzerland

  • Venue:
  • Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering offers significant insights in data analysis. Density based algorithms have emerged as flexible and efficient techniques, able to discover high-quality and potentially irregularly shaped- clusters. We present two fast density-based clustering algorithms based on random projections. Both algorithms demonstrate one to two orders of magnitude speedup compared to equivalent state-of-art density based techniques, even for modest-size datasets. We give a comprehensive analysis of both our algorithms and show runtime of O(dNlog2 N), for a d-dimensional dataset. Our first algorithm can be viewed as a fast variant of the OPTICS density-based algorithm, but using a softer definition of density combined with sampling. The second algorithm is parameter-less, and identifies areas separating clusters.