EDSC: efficient density-based subspace clustering

Authors:
Ira Assent;Ralph Krieger;Emmanuel Müller;Thomas Seidl
Affiliations:
RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany;RWTH Aachen University, Aachen, Germany
Venue:
Proceedings of the 17th ACM conference on Information and knowledge management
Year:
2008

Citing 10
Cited 8

The EM algorithm for graphical association models with missing data

Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SCHISM: A New Approach for Interesting Subspace Mining

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
P3C: A Robust Projected Clustering Algorithm

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
DUSC: Dimensionality Unbiased Subspace Clustering

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

HSM: Heterogeneous Subspace Mining in High Dimensional Data

SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Projected Gustafson Kessel Clustering

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Projected Gustafson-Kessel clustering algorithm and its convergence

Transactions on rough sets XIV
DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Scalable density-based subspace clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
Projective clustering ensembles

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Subspace clustering mines clusters hidden in subspaces of high-dimensional data sets. Density-based approaches have been shown to successfully mine clusters of arbitrary shape even in the presence of noise in full space clustering. Exhaustive search of all density-based subspace clusters, however, results in infeasible runtimes for large high-dimensional data sets. This is due to the exponential number of possible subspace projections in addition to the high computational cost of density-based clustering. In this paper, we propose lossless efficient detection of density-based subspace clusters. In our EDSC (efficient density-based subspace clustering) algorithm we reduce the high computational cost of density-based subspace clustering by a complete multistep filter-and-refine algorithm. Our first hypercube filter step avoids exhaustive search of all regions in all subspaces by enclosing potentially density-based clusters in hypercubes. Our second filter step provides additional pruning based on a density monotonicity property. In the final refinement step, the exact unbiased density-based subspace clustering result is detected. As we prove that pruning is lossless in both filter steps, we guarantee completeness of the result. In thorough experiments on synthetic and real world data sets, we demonstrate substantial efficiency gains. Our lossless EDSC approach outperforms existing density-based subspace clustering algorithms by orders of magnitude.