An architecture for component-based design of representative-based clustering algorithms
Data & Knowledge Engineering
ESC: An efficient synchronization-based clustering algorithm
Knowledge-Based Systems
Hi-index | 0.00 |
How to address the challenges of the "curse of dimensionality" in clustering? Clustering is a powerful data mining technique for structuring and organizing vast amounts of data. However, the high-dimensional data space is usually very sparse and meaningful clusters can only be found in lower dimensional subspaces. In many applications the subspaces hosting the clusters provide valuable information for interpreting the major patterns in the data. Detection of subspace clusters is challenging since usually many of the attributes are noisy, some attributes may exhibit correlations among each other and only few of the attributes truly contribute to the cluster structure. In this paper, we propose ORSC (Arbitrarily ORiented Synchronized Clusters), a novel effective and efficient method to subspace clustering inspired by synchronization. Synchronization is a basic phenomenon prevalent in nature, capable of controlling even highly complex processes such as opinion formation in a group. Control of complex processes is achieved by simple operations based on interactions between objects. Relying on the interaction model for synchronization, our approach ORSC (1) naturally detects correlation clusters in arbitrarily oriented subspaces, including (2) arbitrarily shaped non-linear correlation clusters. Our approach is (3) robust against noise points and outliers. In contrast to previous methods, ORSC is (4) easy to parameterize, since there is no need to specify the subspace dimensionality and all interesting subspace clusters can be detected. Finally, (5) ORSC outperforms most comparison methods in terms of runtime efficiency and is highly scalable to large and high-dimensional data sets.