K-Means Initialization Methods for Improving Clustering by Simulated Annealing

Authors:
Gabriela Trazzi Perim;Estefhan Dazzi Wandekokem;Flávio Miguel Varejão
Affiliations:
Universidade Federal do Espírito Santo, Departamento de Informática, Vitória-ES, Brasil CEP 29060-900;Universidade Federal do Espírito Santo, Departamento de Informática, Vitória-ES, Brasil CEP 29060-900;Universidade Federal do Espírito Santo, Departamento de Informática, Vitória-ES, Brasil CEP 29060-900
Venue:
IBERAMIA '08 Proceedings of the 11th Ibero-American conference on AI: Advances in Artificial Intelligence
Year:
2008

Citing 6
Cited 1

A simulated annealing algorithm for the clustering problem

Pattern Recognition
In search of optimal clusters using genetic algorithms

Pattern Recognition Letters
Data clustering: a review

ACM Computing Surveys (CSUR)
A Deterministic Method for Initializing K-Means Clustering

ICTAI '04 Proceedings of the 16th IEEE International Conference on Tools with Artificial Intelligence
k-means++: the advantages of careful seeding

SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Clustering with a genetically optimized approach

IEEE Transactions on Evolutionary Computation

Improving the performance of k-means for color quantization

Image and Vision Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is defined as the task of dividing a data set such that elements within each subset are similar between themselves and are dissimilar to elements belonging to other subsets. This problem can be understood as an optimization problem that looks for the best configuration of the clusters among all possible configurations. K-means is the most popular approximate algorithm applied to the clustering problem, but it is very sensitive to the start solution and can get stuck in local optima. Metaheuristics can also be used to solve the problem. Nevertheless, the direct application of metaheuristics to the clustering problem seems to be effective only on small data sets. This work suggests the use of methods for finding initial solutions to the K-means algorithm in order to initialize Simulated Annealing and search solutions near the global optima.