Initializing K-means Batch Clustering: A Critical Evaluation of Several Techniques

Authors:
Douglas Steinley;Michael J. Brusco
Affiliations:
University of Missouri-Columbia, Columbia, MO, USA;University of Florida, Gainesville, FL, USA
Venue:
Journal of Classification
Year:
2007

Citing 0
Cited 16

An efficient k'-means clustering algorithm

Pattern Recognition Letters
Initializing Partition-Optimization Algorithms

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
A simple method for screening variables before clustering microarray data

Computational Statistics & Data Analysis
Autocorrelation-based fuzzy clustering of time series

Fuzzy Sets and Systems
A class of fuzzy clusterwise regression models

Information Sciences: an International Journal
Fuzzy clustering of time series in the frequency domain

Information Sciences: an International Journal
A review on particle swarm optimization algorithms and their applications to data clustering

Artificial Intelligence Review
Minkowski metric, feature weighting and anomalous cluster initializing in K-Means clustering

Pattern Recognition
EasyTracker: automatic transit tracking, mapping, and arrival time prediction using smartphones

Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems
Partitive clustering (K-means family)

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Clustering by analytic functions

Information Sciences: an International Journal
On initializations for the minkowski weighted k-means

IDA'12 Proceedings of the 11th international conference on Advances in Intelligent Data Analysis
Fuzzy clustering of human activity patterns

Fuzzy Sets and Systems
An empirical evaluation of different initializations on the number of k-means iterations

MICAI'12 Proceedings of the 11th Mexican international conference on Advances in Artificial Intelligence - Volume Part I
Interpretable clustering using unsupervised binary trees

Advances in Data Analysis and Classification
Analysis of the k-means algorithm in the case of data points occurring on the border of two or more clusters

Knowledge-Based Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

K-means clustering is arguably the most popular technique for partitioning data. Unfortunately, K-means suffers from the well-known problem of locally optimal solutions. Furthermore, the final partition is dependent upon the initial configuration, making the choice of starting partitions all the more important. This paper evaluates 12 procedures proposed in the literature and provides recommendations for best practices.