Comparison of Four Initialization Techniques for the K -Medians Clustering Algorithm

Authors:
Alfons Juan;Enrique Vidal
Affiliations:
-;-
Venue:
Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition
Year:
2000

Citing 3
Cited 5

Algorithms for clustering data

Algorithms for clustering data
Fast K-means-like clustering in metric spaces

Pattern Recognition Letters
Fast Median Search in Metric Spaces

SSPR '98/SPR '98 Proceedings of the Joint IAPR International Workshops on Advances in Pattern Recognition

A Stochastic Approach to Median String Computation

SSPR & SPR '08 Proceedings of the 2008 Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
A distance based clustering method for arbitrary shaped clusters in large datasets

Pattern Recognition
Transforming strings to vector spaces using prototype selection

SSPR'06/SPR'06 Proceedings of the 2006 joint IAPR international conference on Structural, Syntactic, and Statistical Pattern Recognition
Self-organizing map initialization

ICANN'05 Proceedings of the 15th international conference on Artificial Neural Networks: biological Inspirations - Volume Part I
A two-stage genetic algorithm for automatic clustering

Neurocomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering in Metric Spaces can be conveniently performed by the so called k-medians method. It consists of a variant of the popular k-means algorithm in which cluster medians (most centered cluster points) are used instead of the conventional cluster means. Two main aspects of the k-medians algorithm deserve special attention: computing efficiency and initialization. Efficiency issues have been studied in previous works. Here we focus on initialization. Four techniques are studied: Random selection, Supervised selection, the Greedy-Interchange algorithm and the Maxmin algorithm. The capabilities of these techniques are assessed through experiments in two typical applications of Clustering; namely, Exploratory Data Analysis and Unsupervised Prototype Selection. Results clearly show the importance of a good initialization of the k-medians algorithm in all the cases. Random initialization too often leads to bad final partitions, while best results are generally obtained using Supervised selection. The Greedy-Interchange and the Maxmin algorithms generally lead to partitions of high quality, without the manual effort of Supervised selection. From these algorithms, the latter is generally preferred because of its better computational behaviour.