Detecting clusters in moderate-to-high dimensional data: subspace clustering, pattern-based clustering, and correlation clustering

Authors:
Hans-Peter Kriegel;Peer Krö/ger;Arthur Zimek
Affiliations:
Ludwig-Maximilians-Universitä/t Mü//nchen, Mü/nchen, Germany;Ludwig-Maximilians-Universitä/t Mü//nchen, Mü/nchen, Germany;Ludwig-Maximilians-Universitä/t Mü//nchen, Mü/nchen, Germany
Venue:
Proceedings of the VLDB Endowment
Year:
2008

Citing 13
Cited 1

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Clustering by pattern similarity in large data sets

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
MaPle: A Fast Algorithm for Maximal Pattern-based Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Subspace clustering for high dimensional data: a review

ACM SIGKDD Explorations Newsletter - Special issue on learning from imbalanced datasets
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Deriving quantitative models for correlation clusters

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Detection and visualization of subspace cluster hierarchies

DASFAA'07 Proceedings of the 12th international conference on Database systems for advanced applications

Correlation clustering

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

As a prolific research area in data mining, subspace clustering and related problems induced a vast amount of proposed solutions. However, many publications compare a new proposition -- if at all -- with one or two competitors or even with a so called "naïve" ad hoc solution but fail to clarify the exact problem definition. As a consequence, even if two solutions are thoroughly compared experimentally, it will often remain unclear whether both solutions tackle the same problem or, if they do, whether they agree in certain tacit assumptions and how such assumptions may influence the outcome of an algorithm. In this tutorial, we try to clarify (i) the different problem definitions related to subspace clustering in general, (ii) the specific difficulties encountered in this field of research, (iii) the varying assumptions, heuristics, and intuitions forming the basis of different approaches, and (iv) how several prominent solutions essentially tackle different problems.