A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm
Pattern Recognition Letters
New methods for the initialisation of clusters
Pattern Recognition Letters
Mersenne twister: a 623-dimensionally equidistributed uniform pseudo-random number generator
ACM Transactions on Modeling and Computer Simulation (TOMACS) - Special issue on uniform random number generation
ACM Computing Surveys (CSUR)
An empirical comparison of four initialization methods for the K-Means algorithm
Pattern Recognition Letters
LOF: identifying density-based local outliers
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
An Efficient k-Means Clustering Algorithm: Analysis and Implementation
IEEE Transactions on Pattern Analysis and Machine Intelligence
Refining Initial Points for K-Means Clustering
ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
A Divise Initialisation Method for Clustering Algorithms
PKDD '99 Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery
'1 + 1 2': Merging Distance and Density Based Clustering
DASFAA '01 Proceedings of the 7th International Conference on Database Systems for Advanced Applications
Performance criteria for graph clustering and Markov cluster experiments
Performance criteria for graph clustering and Markov cluster experiments
Efficient Disk-Based K-Means Clustering for Relational Databases
IEEE Transactions on Knowledge and Data Engineering
A method for initialising the K-means clustering algorithm using kd-trees
Pattern Recognition Letters
Comparing clusterings---an information based distance
Journal of Multivariate Analysis
k-means++: the advantages of careful seeding
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
Hierarchical initialization approach for K-Means clustering
Pattern Recognition Letters
In search of deterministic methods for initializing K-means and Gaussian mixture clustering
Intelligent Data Analysis
External validation measures for K-means clustering: A data distribution perspective
Expert Systems with Applications: An International Journal
Expert Systems with Applications: An International Journal
NP-hardness of Euclidean sum-of-squares clustering
Machine Learning
Soft Computing - A Fusion of Foundations, Methodologies and Applications
An initialization method for the K-Means algorithm using neighborhood model
Computers & Mathematics with Applications
Adapting the right measures for K-means clustering
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Robust partitional clustering by outlier and density insensitive seeding
Pattern Recognition Letters
SAS/STAT 9.2 User's Guide: Survival Analysis
SAS/STAT 9.2 User's Guide: Survival Analysis
Data clustering: 50 years beyond K-means
Pattern Recognition Letters
Improved step size adaptation for the MO-CMA-ES
Proceedings of the 12th annual conference on Genetic and evolutionary computation
Bandwidth adaptive hardware architecture of K-Means clustering for video analysis
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Improving the performance of k-means for color quantization
Image and Vision Computing
Parallel Spectral Clustering in Distributed Systems
IEEE Transactions on Pattern Analysis and Machine Intelligence
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality
IEEE Transactions on Pattern Analysis and Machine Intelligence
Robust clustering by pruning outliers
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
The planar k-means problem is NP-hard
Theoretical Computer Science
Least squares quantization in PCM
IEEE Transactions on Information Theory
A comparison of several vector quantization codebook generation approaches
IEEE Transactions on Image Processing
A self-organizing network for hyperellipsoidal clustering (HEC)
IEEE Transactions on Neural Networks
Fast and robust fixed-point algorithms for independent component analysis
IEEE Transactions on Neural Networks
Spatial pattern recognition of seismic events in South West Colombia
Computers & Geosciences
Modelling the distribution of solar spectral irradiance using data mining techniques
Environmental Modelling & Software
Hi-index | 12.05 |
K-means is undoubtedly the most widely used partitional clustering algorithm. Unfortunately, due to its gradient descent nature, this algorithm is highly sensitive to the initial placement of the cluster centers. Numerous initialization methods have been proposed to address this problem. In this paper, we first present an overview of these methods with an emphasis on their computational efficiency. We then compare eight commonly used linear time complexity initialization methods on a large and diverse collection of data sets using various performance criteria. Finally, we analyze the experimental results using non-parametric statistical tests and provide recommendations for practitioners. We demonstrate that popular initialization methods often perform poorly and that there are in fact strong alternatives to these methods.