The EM algorithm for graphical association models with missing data
Computational Statistics & Data Analysis - Special issue dedicated to Toma´sˇ Havra´nek
Automatic subspace clustering of high dimensional data for data mining applications
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
SCHISM: A New Approach for Interesting Subspace Mining
ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
P3C: A Robust Projected Clustering Algorithm
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
DUSC: Dimensionality Unbiased Subspace Clustering
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
HSM: Heterogeneous Subspace Mining in High Dimensional Data
SSDBM 2009 Proceedings of the 21st International Conference on Scientific and Statistical Database Management
Projected Gustafson Kessel Clustering
RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Projected Gustafson-Kessel clustering algorithm and its convergence
Transactions on rough sets XIV
DB-CSC: a density-based approach for subspace clustering in graphs with feature vectors
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part I
Scalable density-based subspace clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
A survey on enhanced subspace clustering
Data Mining and Knowledge Discovery
Projective clustering ensembles
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Subspace clustering mines clusters hidden in subspaces of high-dimensional data sets. Density-based approaches have been shown to successfully mine clusters of arbitrary shape even in the presence of noise in full space clustering. Exhaustive search of all density-based subspace clusters, however, results in infeasible runtimes for large high-dimensional data sets. This is due to the exponential number of possible subspace projections in addition to the high computational cost of density-based clustering. In this paper, we propose lossless efficient detection of density-based subspace clusters. In our EDSC (efficient density-based subspace clustering) algorithm we reduce the high computational cost of density-based subspace clustering by a complete multistep filter-and-refine algorithm. Our first hypercube filter step avoids exhaustive search of all regions in all subspaces by enclosing potentially density-based clusters in hypercubes. Our second filter step provides additional pruning based on a density monotonicity property. In the final refinement step, the exact unbiased density-based subspace clustering result is detected. As we prove that pruning is lossless in both filter steps, we guarantee completeness of the result. In thorough experiments on synthetic and real world data sets, we demonstrate substantial efficiency gains. Our lossless EDSC approach outperforms existing density-based subspace clustering algorithms by orders of magnitude.