Trail-and-Error approach for determining the number of clusters

Authors:
Haojun Sun;Mei Sun
Affiliations:
College of Mathematics and Computer Science, University of Hebei, Baoding, Hebei, China;College of Mathematics and Computer Science, University of Hebei, Baoding, Hebei, China
Venue:
ICMLC'05 Proceedings of the 4th international conference on Advances in Machine Learning and Cybernetics
Year:
2005

Citing 4
Cited 1

Algorithms for clustering data

Algorithms for clustering data
A new cluster validity index for the fuzzy c-mean

Pattern Recognition Letters
An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
On cluster validity for the fuzzy c-means model

IEEE Transactions on Fuzzy Systems

Clustering of gene expression data based on shape similarity

EURASIP Journal on Bioinformatics and Systems Biology - Special issue on applications of signal procesing techniques to bioinformatics, genomics, and proteomics

Quantified Score

Hi-index	0.00

Visualization

Abstract

Automatically determining the number of clusters is an important issue in cluster analysis. In this paper, we explore “trial-and-error” approach to determining the number of clusters in a given data set. The fuzzy clustering algorithm, FCM, is selected as the basic “trial” algorithm and cluster validity optimization responses to the “error” procedure. To improve the computation speed, we propose two strategies, eliminating and splitting, which allow the FCM-based algorithms more efficient. To improve existing validity measures, we make use of a new validity function that fits particularly data sets containing overlapping clusters. Experimental results are given to illustrate the performance of the new algorithms.