Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters

Authors:
Wan Maseri Binti Wan Mohd;A.H. Beg;Tutut Herawan;A. Noraziah;K. F. Rabbi
Affiliations:
University Malaysia Pahang, Malaysia;University Malaysia Pahang, Malaysia;University Malaysia Pahang, Malaysia;University Malaysia Pahang, Malaysia;University Malaysia Pahang, Malaysia
Venue:
International Journal of Information Retrieval Research
Year:
2011

Citing 26
Cited 0

Algorithms for clustering data

Algorithms for clustering data
A near-optimal initial seed value selection in K-means algorithm using a genetic algorithm

Pattern Recognition Letters
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Data clustering: a review

ACM Computing Surveys (CSUR)
Data Mining: Introductory and Advanced Topics

Data Mining: Introductory and Advanced Topics
Artifact reduction of JPEG coded images using mean-removed classified vector quantization

Signal Processing
An Efficient k-Means Clustering Algorithm: Analysis and Implementation

IEEE Transactions on Pattern Analysis and Machine Intelligence
An evolutionary technique based on K-means algorithm for optimal clustering in RN

Information Sciences—Applications: An International Journal
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Cluster center initialization algorithm for K-means clustering

Pattern Recognition Letters
A Modified K-Means Algorithm for Circular Invariant Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
A method for initialising the K-means clustering algorithm using kd-trees

Pattern Recognition Letters
Modified global k-means algorithm for clustering in gene expression data sets

WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
A genetic algorithm that exchanges neighboring centers for k-means clustering

Pattern Recognition Letters
Modified global k-means algorithm for minimum sum-of-squares clustering problems

Pattern Recognition
A genetic algorithm with gene rearrangement for K-means clustering

Pattern Recognition
Fast global k-means clustering using cluster membership and inequality

Pattern Recognition
Ant clustering algorithm with K-harmonic means clustering

Expert Systems with Applications: An International Journal
A time-efficient pattern reduction algorithm for k-means clustering

Information Sciences: an International Journal
Fast modified global k-means algorithm for incremental cluster construction

Pattern Recognition
Particle swarm optimization based K-means clustering approach for security assessment in power systems

Expert Systems with Applications: An International Journal
K-Means-Type Algorithms: A Generalized Convergence Theorem and Characterization of Local Optimality

IEEE Transactions on Pattern Analysis and Machine Intelligence
Nonparametric genetic clustering: comparison of validity indices

IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Genetic K-means algorithm

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Survey of clustering algorithms

IEEE Transactions on Neural Networks

Quantified Score

Hi-index	0.00

Visualization

Abstract

K-means is an unsupervised learning and partitioning clustering algorithm. It is popular and widely used for its simplicity and fastness. K-means clustering produce a number of separate flat non-hierarchical clusters and suitable for generating globular clusters. The main drawback of the k-means algorithm is that the user must specify the number of clusters in advance. This paper presents an improved version of K-means algorithm with auto-generate an initial number of clusters k and a new approach of defining initial Centroid for effective and efficient clustering process. The underlined mechanism has been analyzed and experimented. The experimental results show that the number of iteration is reduced to 50% and the run time is lower and constantly based on maximum distance of data points, regardless of how many data points.