Silhouettes: a graphical aid to the interpretation and validation of cluster analysis
Journal of Computational and Applied Mathematics
CURE: an efficient clustering algorithm for large databases
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
Density biased sampling: an improved method for data mining and clustering
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Mean Shift: A Robust Approach Toward Feature Space Analysis
IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Biased Sampling for Approximate Clustering and Outlier Detection in Large Data Sets
IEEE Transactions on Knowledge and Data Engineering
Information theoretic measures for clusterings comparison: is a correction for chance necessary?
ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Data clustering with size constraints
Knowledge-Based Systems
Synchronization based outlier detection
ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Understanding of Internal Clustering Validation Measures
ICDM '10 Proceedings of the 2010 IEEE International Conference on Data Mining
Detection of Arbitrarily Oriented Synchronized Clusters in High-Dimensional Data
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Hi-index | 0.00 |
Clustering is an essential approach for detecting the intrinsic groups in data. An efficient clustering algorithm based on a generalized local synchronization model is proposed. It uses a novel stopping criterion of data synchronization to detect clusters prior to the perfect synchronization. Moreover, a density-biased sampling method is adopted to extract samples from the original data set. The clustering structure can be effectively revealed on the samples. As a result, the clustering efficiency is significantly improved. By using a cluster validity criterion, the proposed algorithm can find clusters of arbitrary number, shape, size and density as well as isolate noises in the vector data without any data distribution assumption. Extensive experiments on several synthetic and real-world data sets show that the proposed algorithm possesses high accuracy and it is more efficient than the state-of-the-art synchronization-based clustering method.