OPTICS: ordering points to identify the clustering structure
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Finding generalized projected clusters in high dimensional spaces
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
When Is ''Nearest Neighbor'' Meaningful?
ICDT '99 Proceedings of the 7th International Conference on Database Theory
On the Surprising Behavior of Distance Metrics in High Dimensional Spaces
ICDT '01 Proceedings of the 8th International Conference on Database Theory
What Is the Nearest Neighbor in High Dimensional Spaces?
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Local Dimensionality Reduction: A New Approach to Indexing High Dimensional Spaces
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Computing Clusters of Correlation Connected objects
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Machine Learning
CURLER: finding and visualizing nonlinear correlation clusters
Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Mining Hierarchies of Correlation Clusters
SSDBM '06 Proceedings of the 18th International Conference on Scientific and Statistical Database Management
On Exploring Complex Relationships of Correlation Clusters
SSDBM '07 Proceedings of the 19th International Conference on Scientific and Statistical Database Management
ELKI: A Software System for Evaluation of Subspace Clustering Algorithms
SSDBM '08 Proceedings of the 20th international conference on Scientific and Statistical Database Management
Can shared-neighbor distances defeat the curse of dimensionality?
SSDBM'10 Proceedings of the 22nd international conference on Scientific and statistical database management
Feature interaction in subspace clustering using the Choquet integral
Pattern Recognition
Unsupervised feature relevance analysis applied to improve ECG heartbeat clustering
Computer Methods and Programs in Biomedicine
A survey on unsupervised outlier detection in high-dimensional numerical data
Statistical Analysis and Data Mining
Latent outlier detection and the low precision problem
Proceedings of the ACM SIGKDD Workshop on Outlier Detection and Description
Hi-index | 0.00 |
Most correlation clustering algorithms rely on principal component analysis (PCA) as a correlation analysis tool. The correlation of each cluster is learned by applying PCA to a set of sample points. Since PCA is rather sensitive to outliers, if a small fraction of these points does not correspond to the correct correlation of the cluster, the algorithms are usually misled or even fail to detect the correct results. In this paper, we evaluate the influence of outliers on PCA and propose a general framework for increasing the robustness of PCA in order to determine the correct correlation of each cluster. We further show how our framework can be applied to PCA-based correlation clustering algorithms. A thorough experimental evaluation shows the benefit of our framework on several synthetic and real-world data sets.