Evaluating clustering in subspace projections of high dimensional data

Authors:
Emmanuel Müller;Stephan Günnemann;Ira Assent;Thomas Seidl
Affiliations:
RWTH Aachen University, Germany;RWTH Aachen University, Germany;Aalborg University, Denmark;RWTH Aachen University, Germany
Venue:
Proceedings of the VLDB Endowment
Year:
2009

Citing 20
Cited 23

Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Data mining: concepts and techniques

Data mining: concepts and techniques
A Monte Carlo algorithm for fast projective clustering

Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Frequent-Pattern based Iterative Projected Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
SCHISM: A New Approach for Interesting Subspace Mining

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Comparing Subspace Clusterings

IEEE Transactions on Knowledge and Data Engineering
LB_Keogh supports exact indexing of shapes under rotation invariance with arbitrary representations and distance measures

VLDB '06 Proceedings of the 32nd international conference on Very large data bases
P3C: A Robust Projected Clustering Algorithm

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)

Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Finding non-redundant, statistically significant regions in high dimensional data: a novel approach to projected and subspace clustering

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Morpheus: interactive exploration of subspace clustering

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
The Chosen Few: On Identifying Valuable Patterns

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
DUSC: Dimensionality Unbiased Subspace Clustering

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
INSCY: Indexing Subspace Clusters with In-Process-Removal of Redundancy

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining

Adaptive outlierness for subspace outlier ranking

CIKM '10 Proceedings of the 19th ACM international conference on Information and knowledge management
SOREX: subspace outlier ranking exploration toolkit

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
CoDA: interactive cluster based concept discovery

Proceedings of the VLDB Endowment
Agent-based subspace clustering

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
Tracing evolving clusters by subspace and value similarity

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part II
An extension of the PMML standard to subspace clustering models

Proceedings of the 2011 workshop on Predictive markup language modeling
Efficient selectivity estimation by histogram construction based on subspace clustering

SSDBM'11 Proceedings of the 23rd international conference on Scientific and statistical database management
Comparing apples and oranges: measuring differences between data mining results

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part III
Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering

Pattern Recognition
Designing an ensemble classifier over subspace classifiers using iterative convergence routine

Proceedings of the 20th ACM international conference on Information and knowledge management
Scalable density-based subspace clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
External evaluation measures for subspace clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Tracing Evolving Subspace Clusters in Temporal Climate Data

Data Mining and Knowledge Discovery
Subgraph mining on directed and weighted graphs

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Clustering high dimensional data

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
An evolutionary subspace clustering algorithm for high-dimensional data

Proceedings of the 14th annual conference companion on Genetic and evolutionary computation
Mining of temporal coherent subspace clusters in multivariate time series databases

PAKDD'12 Proceedings of the 16th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part I
Sensitivity of self-tuning histograms: query order affecting accuracy and robustness

SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
A survey on enhanced subspace clustering

Data Mining and Knowledge Discovery
Projective clustering ensembles

Data Mining and Knowledge Discovery
Using Multidimensional Clustering Based Collaborative Filtering Approach Improving Recommendation Diversity

WI-IAT '12 Proceedings of the The 2012 IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technology - Volume 03
GPUMAFIA: efficient subspace clustering with MAFIA on GPUs

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Evolving soft subspace clustering

Applied Soft Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering high dimensional data is an emerging research field. Subspace clustering or projected clustering group similar objects in subspaces, i.e. projections, of the full space. In the past decade, several clustering paradigms have been developed in parallel, without thorough evaluation and comparison between these paradigms on a common basis. Conclusive evaluation and comparison is challenged by three major issues. First, there is no ground truth that describes the "true" clusters in real world data. Second, a large variety of evaluation measures have been used that reflect different aspects of the clustering result. Finally, in typical publications authors have limited their analysis to their favored paradigm only, while paying other paradigms little or no attention. In this paper, we take a systematic approach to evaluate the major paradigms in a common framework. We study representative clustering algorithms to characterize the different aspects of each paradigm and give a detailed comparison of their properties. We provide a benchmark set of results on a large variety of real world and synthetic data sets. Using different evaluation measures, we broaden the scope of the experimental analysis and create a common baseline for future developments and comparable evaluations in the field. For repeatability, all implementations, data sets and evaluation measures are available on our website.