Fast parameterless density-based clustering via random projections

Authors:
Johannes Schneider;Michail Vlachos
Affiliations:
IBM Research - Zurich, Rueschlikon, Switzerland;IBM Research - Zurich, Rueschlikon, Switzerland
Venue:
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Year:
2013

Citing 10
Cited 0

OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
A Maximum Variance Cluster Algorithm

IEEE Transactions on Pattern Analysis and Machine Intelligence
Compression, Clustering, and Pattern Discovery in Very High-Dimensional Discrete-Attribute Data Sets

IEEE Transactions on Knowledge and Data Engineering
Clustering aggregation

ACM Transactions on Knowledge Discovery from Data (TKDD)
Graph-Theoretical Methods for Detecting and Describing Gestalt Clusters

IEEE Transactions on Computers
Random projection trees and low dimensional manifolds

STOC '08 Proceedings of the fortieth annual ACM symposium on Theory of computing
Data Clustering: User's Dilemma

MLDM '07 Proceedings of the 5th international conference on Machine Learning and Data Mining in Pattern Recognition
Two-level k-means clustering algorithm for k-τ relationship establishment and linear-time classification

Pattern Recognition
DENCLUE 2.0: fast clustering based on kernel density estimation

IDA'07 Proceedings of the 7th international conference on Intelligent data analysis
Scalable and Memory-Efficient Clustering of Large-Scale Social Networks

ICDM '12 Proceedings of the 2012 IEEE 12th International Conference on Data Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering offers significant insights in data analysis. Density based algorithms have emerged as flexible and efficient techniques, able to discover high-quality and potentially irregularly shaped- clusters. We present two fast density-based clustering algorithms based on random projections. Both algorithms demonstrate one to two orders of magnitude speedup compared to equivalent state-of-art density based techniques, even for modest-size datasets. We give a comprehensive analysis of both our algorithms and show runtime of O(dNlog2 N), for a d-dimensional dataset. Our first algorithm can be viewed as a fast variant of the OPTICS density-based algorithm, but using a softer definition of density combined with sampling. The second algorithm is parameter-less, and identifies areas separating clusters.