Improving the efficiency and efficacy of the K-means clustering algorithm through a new convergence condition

Authors:
O. Joaquín Pérez;R. Rodolfo Pazos;R. Laura Cruz;S. Gerardo Reyes;T. Rosy Basave;H. Héctor Fraire
Affiliations:
Centro Nacional de Investigación y Desarrollo Tecnológico;Centro Nacional de Investigación y Desarrollo Tecnológico;Instituto Tecnológico de Ciudad Madero;Centro Nacional de Investigación y Desarrollo Tecnológico;Centro Nacional de Investigación y Desarrollo Tecnológico;Instituto Tecnológico de Ciudad Madero
Venue:
ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Year:
2007

Citing 8
Cited 0

Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
A Modified Version of the K-Means Algorithm with a Distance Based on Cluster Symmetry

IEEE Transactions on Pattern Analysis and Machine Intelligence
A local search approximation algorithm for k-means clustering

Proceedings of the eighteenth annual symposium on Computational geometry
Alternatives to the k-means algorithm that find better clusterings

Proceedings of the eleventh international conference on Information and knowledge management
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
An experimental comparison of several clustering and initialization methods

UAI'98 Proceedings of the Fourteenth conference on Uncertainty in artificial intelligence

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering problems arise in many different applications: machine learning, data mining, knowledge discovery, data compression, vector quantization, pattern recognition and pattern classification. One of the most popular and widely studied clustering methods is K-means. Several improvements to the standard K-means algorithm have been carried out, most of them related to the initial parameter values. In contrast, this article proposes an improvement using a new convergence condition that consists of stopping the execution when a local optimum is found or no more object exchanges among groups can be performed. For assessing the improvement attained, the modified algorithm (Early Stop K-means) was tested on six databases of the UCI repository, and the results were compared against SPSS, Weka and the standard K-means algorithm. Experimentally Early Stop K-means obtained important reductions in the number of iterations and improvements in the solution quality with respect to the other algorithms.