Improving the efficiency and efficacy of the K-means clustering algorithm through a new convergence condition

  • Authors:
  • O. Joaquín Pérez;R. Rodolfo Pazos;R. Laura Cruz;S. Gerardo Reyes;T. Rosy Basave;H. Héctor Fraire

  • Affiliations:
  • Centro Nacional de Investigación y Desarrollo Tecnológico;Centro Nacional de Investigación y Desarrollo Tecnológico;Instituto Tecnológico de Ciudad Madero;Centro Nacional de Investigación y Desarrollo Tecnológico;Centro Nacional de Investigación y Desarrollo Tecnológico;Instituto Tecnológico de Ciudad Madero

  • Venue:
  • ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering problems arise in many different applications: machine learning, data mining, knowledge discovery, data compression, vector quantization, pattern recognition and pattern classification. One of the most popular and widely studied clustering methods is K-means. Several improvements to the standard K-means algorithm have been carried out, most of them related to the initial parameter values. In contrast, this article proposes an improvement using a new convergence condition that consists of stopping the execution when a local optimum is found or no more object exchanges among groups can be performed. For assessing the improvement attained, the modified algorithm (Early Stop K-means) was tested on six databases of the UCI repository, and the results were compared against SPSS, Weka and the standard K-means algorithm. Experimentally Early Stop K-means obtained important reductions in the number of iterations and improvements in the solution quality with respect to the other algorithms.