Fast modified global k-means algorithm for incremental cluster construction

Authors:
Adil M. Bagirov;Julien Ugon;Dean Webb
Affiliations:
Centre for Informatics and Applied Optimization, Graduate School of Information Technology and Mathematical Sciences, University of Ballarat, Victoria 3353, Australia;Centre for Informatics and Applied Optimization, Graduate School of Information Technology and Mathematical Sciences, University of Ballarat, Victoria 3353, Australia;Centre for Informatics and Applied Optimization, Graduate School of Information Technology and Mathematical Sciences, University of Ballarat, Victoria 3353, Australia
Venue:
Pattern Recognition
Year:
2011

Citing 13
Cited 6

A simulated annealing algorithm for the clustering problem

Pattern Recognition
An Interior Point Algorithm for Minimum Sum-of-Squares Clustering

SIAM Journal on Scientific Computing
Variable Neighborhood Decomposition Search

Journal of Heuristics
Knowledge Acquisition Via Incremental Conceptual Clustering

Machine Learning
The Anchors Hierarchy: Using the Triangle Inequality to Survive High Dimensional Data

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
Better streaming algorithms for clustering problems

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
Batch and median neural gas

Neural Networks - 2006 Special issue: Advances in self-organizing maps--WSOM'05
Modified global k-means algorithm for clustering in gene expression data sets

WISB '06 Proceedings of the 2006 workshop on Intelligent systems for bioinformatics - Volume 73
A Branch and Bound Clustering Algorithm

IEEE Transactions on Computers
Modified global k-means algorithm for minimum sum-of-squares clustering problems

Pattern Recognition
The hyperbolic smoothing clustering method

Pattern Recognition
Fast global k-means clustering using cluster membership and inequality

Pattern Recognition
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence

Partitive clustering (K-means family)

Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Improved Parameterless K-Means: Auto-Generation Centroids and Distance Data Point Clusters

International Journal of Information Retrieval Research
A sample-based hierarchical adaptive K-means clustering method for large-scale video retrieval

Knowledge-Based Systems
Fast global k-means clustering based on local geometrical information

Information Sciences: an International Journal
A fast partitioning algorithm and its application to earthquake investigation

Computers & Geosciences
An Efficient Hybrid Artificial Bee Colony Algorithm for Customer Segmentation in Mobile E-commerce

Journal of Electronic Commerce in Organizations

Quantified Score

Hi-index	0.01

Visualization

Abstract

The k-means algorithm and its variations are known to be fast clustering algorithms. However, they are sensitive to the choice of starting points and are inefficient for solving clustering problems in large datasets. Recently, incremental approaches have been developed to resolve difficulties with the choice of starting points. The global k-means and the modified global k-means algorithms are based on such an approach. They iteratively add one cluster center at a time. Numerical experiments show that these algorithms considerably improve the k-means algorithm. However, they require storing the whole affinity matrix or computing this matrix at each iteration. This makes both algorithms time consuming and memory demanding for clustering even moderately large datasets. In this paper, a new version of the modified global k-means algorithm is proposed. We introduce an auxiliary cluster function to generate a set of starting points lying in different parts of the dataset. We exploit information gathered in previous iterations of the incremental algorithm to eliminate the need of computing or storing the whole affinity matrix and thereby to reduce computational effort and memory usage. Results of numerical experiments on six standard datasets demonstrate that the new algorithm is more efficient than the global and the modified global k-means algorithms.