Comparison of the performance of center-based clustering algorithms

  • Authors:
  • Bin Zhang

  • Affiliations:
  • Hewlett-Packard Research Laboratories, Palo Alto

  • Venue:
  • PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Center-based clustering algorithms like K-means, and EM are one of the most popular classes of clustering algorithms in use today. The author developed another variation in this family - K-Harmonic Means (KHM). It has been demonstrated using a small number of "benchmark" datasets that KHM is more robust than K-means and EM. In this paper, we compare their performance statistically. We run K-means, K-Harmonic Means and EM on each of 3600 pairs of (dataset, initialization) to compare the statistical average and variation of the performance of these algorithms. The results are that, for low dimensional datasets, KHM performs consistently better than KM, and KM performs consistently better than EM over a large variation of clustered-ness of the datasets and a large variation of initializations. Some of the reasons that contributed to this difference are explained.