A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data

Authors:
Hans-Peter Kriegel;Peer Kroger;Matthias Renz;Sebastian Wurst
Affiliations:
University of Munich;University of Munich;University of Munich;University of Munich
Venue:
ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Year:
2005

Citing 9
Cited 27

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Feature Selection for Clustering - A Filter Solution

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
d-Clusters: Capturing Subspace Correlation in a Large Data Set

ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Subspace Selection for Clustering High-Dimensional Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Density Connected Clustering with Local Subspace Preferences

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining

Grid-based subspace clustering over data streams

Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
VISA: visual subspace clustering analysis

ACM SIGKDD Explorations Newsletter - Special issue on visual analytics
Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Memory efficient subspace clustering for online data streams

IDEAS '08 Proceedings of the 2008 international symposium on Database engineering & applications
Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering

Proceedings of the VLDB Endowment
EDSC: efficient density-based subspace clustering

Proceedings of the 17th ACM conference on Information and knowledge management
Clustering high-dimensional data: A survey on subspace clustering, pattern-based clustering, and correlation clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Subspace and projected clustering: experimental evaluation and analysis

Knowledge and Information Systems
Evaluating clustering in subspace projections of high dimensional data

Proceedings of the VLDB Endowment
Projected Gustafson Kessel Clustering

RSFDGrC '09 Proceedings of the 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Detection and visualization of subspace cluster hierarchies

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications
Mining quality-aware subspace clusters

PAKDD'08 Proceedings of the 12th Pacific-Asia conference on Advances in knowledge discovery and data mining
Mining representative subspace clusters in high-dimensional data

FSKD'09 Proceedings of the 6th international conference on Fuzzy systems and knowledge discovery - Volume 1
Projected Gustafson-Kessel clustering algorithm and its convergence

Transactions on rough sets XIV
The role of hubness in clustering high-dimensional data

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Clustering very large multi-dimensional datasets with MapReduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
An extension of the PMML standard to subspace clustering models

Proceedings of the 2011 workshop on Predictive markup language modeling
Scalable density-based subspace clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Finding hierarchies of subspace clusters

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases
Fuse: towards multi-level functional summarization of protein interaction networks

Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine
Partitive clustering (K-means family)

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Reduct and variance based clustering of high dimensional dataset

ICDEM'10 Proceedings of the Second international conference on Data Engineering and Management
Subspace clustering

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
Projective clustering ensembles

Data Mining and Knowledge Discovery
Stochastic subspace search for top-k multi-view clustering

Proceedings of the 4th MultiClust Workshop on Multiple Clusterings, Multi-view Data, and Multi-source Knowledge-driven Clustering
Subspace clustering of high-dimensional data: an evolutionary approach

Applied Computational Intelligence and Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Subspace clustering has been investigated extensively since traditional clustering algorithms often fail to detect meaningful clusters in high-dimensional data spaces. Many recently proposed subspace clustering methods suffer from two severe problems: First, the algorithms typically scale exponentially with the data dimensionality and/or the subspace dimensionality of the clusters. Second, for performance reasons, many algorithms use a global density threshold for clustering, which is quite questionable since clusters in subspaces of significantly different dimensionality will most likely exhibt significantly varying densities. In this paper, we propose a generic framework to overcome these limitations. Our framework is based on an efficient filter-refinement architecture that scales at most quadratic w.r.t. the data dimensionality and the dimensionality of the subspace clusters. It can be applied to any clustering notions including notions that are based on a local density threshold. A broad experimental evaluation on synthetic and real-world data empirically shows that our method achieves a significant gain of runtime and quality in comparison to state-of-the-art subspace clustering algorithms.