A General Framework for Increasing the Robustness of PCA-Based Correlation Clustering Algorithms

Authors:
Hans-Peter Kriegel;Peer Kröger;Erich Schubert;Arthur Zimek
Affiliations:
Institute for Informatics, Ludwig-Maximilians-Universität München,;Institute for Informatics, Ludwig-Maximilians-Universität München,;Institute for Informatics, Ludwig-Maximilians-Universität München,;Institute for Informatics, Ludwig-Maximilians-Universität München,
Venue:
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Year:
2008

Citing 11
Cited 6

OPTICS: ordering points to identify the clustering structure

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
When Is ''Nearest Neighbor'' Meaningful?

ICDT '99 Proceedings of the 7th International Conference on Database Theory
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces

ICDT '01 Proceedings of the 8th International Conference on Database Theory
What Is the Nearest Neighbor in High Dimensional Spaces?

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Computing Clusters of Correlation Connected objects

SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Correlation Clustering

Machine Learning
CURLER: finding and visualizing nonlinear correlation clusters

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Mining Hierarchies of Correlation Clusters

SSDBM '06 Proceedings of the 18th International Conference on Scientific and Statistical Database Management
On Exploring Complex Relationships of Correlation Clusters

SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management

ELKI: A Software System for Evaluation of Subspace Clustering Algorithms

SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Can shared-neighbor distances defeat the curse of dimensionality?

SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Feature interaction in subspace clustering using the Choquet integral

Pattern Recognition
Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering

Computer Methods and Programs in Biomedicine
A survey on unsupervised outlier detection in high-dimensional numerical data

Statistical Analysis and Data Mining
Latent outlier detection and the low precision problem

Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most correlation clustering algorithms rely on principal component analysis (PCA) as a correlation analysis tool. The correlation of each cluster is learned by applying PCA to a set of sample points. Since PCA is rather sensitive to outliers, if a small fraction of these points does not correspond to the correct correlation of the cluster, the algorithms are usually misled or even fail to detect the correct results. In this paper, we evaluate the influence of outliers on PCA and propose a general framework for increasing the robustness of PCA in order to determine the correct correlation of each cluster. We further show how our framework can be applied to PCA-based correlation clustering algorithms. A thorough experimental evaluation shows the benefit of our framework on several synthetic and real-world data sets.