Algorithms for clustering data
Algorithms for clustering data
ACM Computing Surveys (CSUR)
Cluster analysis: a further approach based on density estimation
Computational Statistics & Data Analysis
Clustering Algorithms
X-means: Extending K-means with Efficient Estimation of the Number of Clusters
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Stability-based validation of clustering solutions
Neural Computation
How Many Clusters? An Information-Theoretic Perspective
Neural Computation
Resampling Method for Unsupervised Estimation of Cluster Validity
Neural Computation
Data Clustering: Theory, Algorithms, and Applications (ASA-SIAM Series on Statistics and Applied Probability)
A statistical model of cluster stability
Pattern Recognition
NP-hardness of Euclidean sum-of-squares clustering
Machine Learning
Scale-based clustering using the radial basis function network
IEEE Transactions on Neural Networks
Survey of clustering algorithms
IEEE Transactions on Neural Networks
Special issue recent advances in soft computing: Theories and applications
Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Recent Advances in Soft Computing: Theories and Applications
Hi-index | 0.00 |
Cluster validation is the task of estimating the quality of a given partition of a data set into clusters of similar objects. Normally, a clustering algorithm requires a desired number of clusters as a parameter. We consider the cluster validation problem of determining the optimal “true” number of clusters. We adopt the stability testing approach, according to which, repeated applications of a given clustering algorithm provide similar results when the specified number of clusters is correct. To implement this idea, we draw pairs of independent equal sized samples, where one sample in any pair is drawn from the data source and the other one is drawn from a noised version thereof. We then run the same clustering method on both samples in any pair and test the similarity between the obtained partitions using a general k-Nearest Neighbor Binomial model. These similarity measurements enable us to estimate the correct number of clusters. A series of numerical experiments on both synthetic and real world data demonstrates the high capability of the offered discipline compared to other methods. In particular, the use of a noised data set is shown to produce significantly better results than in the case of using two independent samples which are both drawn from the data source.