Comparison of the performance of center-based clustering algorithms

Authors:
Bin Zhang
Affiliations:
Hewlett-Packard Research Laboratories, Palo Alto
Venue:
PAKDD'03 Proceedings of the 7th Pacific-Asia conference on Advances in knowledge discovery and data mining
Year:
2003

Citing 4
Cited 4

Vector quantization and signal compression

Vector quantization and signal compression
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
An experimental comparison of model-based clustering methods

Machine Learning

Regression Clustering

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
An Alternative to Center-Based Clustering Algorithm Via Statistical Learning Analysis

ICIC '08 Proceedings of the 4th international conference on Intelligent Computing: Advanced Intelligent Computing Theories and Applications - with Aspects of Artificial Intelligence
Explorative data analysis techniques and unsupervised clustering methods to support clinical assessment of Chronic Obstructive Pulmonary Disease (COPD) phenotypes

Journal of Biomedical Informatics
Discovering dangerous patterns in long-term ambulatory ECG recordings using a fast QRS detection algorithm and explorative data analysis

Computer Methods and Programs in Biomedicine

Quantified Score

Hi-index	0.00

Visualization

Abstract

Center-based clustering algorithms like K-means, and EM are one of the most popular classes of clustering algorithms in use today. The author developed another variation in this family - K-Harmonic Means (KHM). It has been demonstrated using a small number of "benchmark" datasets that KHM is more robust than K-means and EM. In this paper, we compare their performance statistically. We run K-means, K-Harmonic Means and EM on each of 3600 pairs of (dataset, initialization) to compare the statistical average and variation of the performance of these algorithms. The results are that, for low dimensional datasets, KHM performs consistently better than KM, and KM performs consistently better than EM over a large variation of clustered-ness of the datasets and a large variation of initializations. Some of the reasons that contributed to this difference are explained.