Alternatives to the k-means algorithm that find better clusterings

Authors:
Greg Hamerly;Charles Elkan
Affiliations:
University of California, San Diego, La Jolla, CA;University of California, San Diego, La Jolla, CA
Venue:
Proceedings of the eleventh international conference on Information and knowledge management
Year:
2002

Citing 15
Cited 37

Accelerating exact k-means algorithms with geometric reasoning

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering in large graphs and matrices

Proceedings of the tenth annual ACM-SIAM symposium on Discrete algorithms
An empirical comparison of four initialization methods for the K-Means algorithm

Pattern Recognition Letters
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Generalized clustering, supervised learning, and data assignment

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Mean Shift: A Robust Approach Toward Feature Space Analysis

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
BIRCH: A New Data Clustering Algorithm and Its Applications

Data Mining and Knowledge Discovery
An Experimental Comparison of Model-Based Clustering Methods

Machine Learning
Repairing Faulty Mixture Models using Density Estimation

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Refining Initial Points for K-Means Clustering

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
X-means: Extending K-means with Efficient Estimation of the Number of Clusters

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Experiments with Random Projection

UAI '00 Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence
An information-theoretic analysis of hard and soft assignment methods for clustering

UAI'97 Proceedings of the Thirteenth conference on Uncertainty in artificial intelligence

Soft clustering criterion functions for partitional document clustering: a summary of results

Proceedings of the thirteenth ACM international conference on Information and knowledge management
Fast Recognition of Musical Genres Using RBF Networks

IEEE Transactions on Knowledge and Data Engineering
STRG-Index: spatio-temporal region graph indexing for large video databases

Proceedings of the 2005 ACM SIGMOD international conference on Management of data
Integrating K-Means Clustering with a Relational DBMS Using SQL

IEEE Transactions on Knowledge and Data Engineering
A Genetic Algorithm Using Hyper-Quadtrees for Low-Dimensional K-means Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
On Weighting Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Bottom-Up Extraction and Trust-Based Refinement of Ontology Metadata

IEEE Transactions on Knowledge and Data Engineering
A Unified Continuous Optimization Framework for Center-Based Clustering Methods

The Journal of Machine Learning Research
An overview of clustering methods

Intelligent Data Analysis
Leveraging user query log: toward improving image data clustering

CIVR '08 Proceedings of the 2008 international conference on Content-based image and video retrieval
Adaptive explicit decision functions for probabilistic design and optimization using support vector machines

Computers and Structures
Constrained locally weighted clustering

Proceedings of the VLDB Endowment
Applying K-harmonic means clustering to the part-machine classification problem

Expert Systems with Applications: An International Journal
Clustering of document collection - A weighting approach

Expert Systems with Applications: An International Journal
Three-phase strategy for the OSD learning method in RBF neural networks

Neurocomputing
An efficient hybrid data clustering method based on K-harmonic means and Particle Swarm Optimization

Expert Systems with Applications: An International Journal
Color homogram for segmentation of fine needle biopsy images

Machine Graphics & Vision International Journal
Explorative data analysis techniques and unsupervised clustering methods to support clinical assessment of Chronic Obstructive Pulmonary Disease (COPD) phenotypes

Journal of Biomedical Informatics
Discovering dangerous patterns in long-term ambulatory ECG recordings using a fast QRS detection algorithm and explorative data analysis

Computer Methods and Programs in Biomedicine
Unsupervised classification of polarimetric SAR image with dynamic clustering: An image processing approach

Advances in Engineering Software
Improving the efficiency and efficacy of the K-means clustering algorithm through a new convergence condition

ICCSA'07 Proceedings of the 2007 international conference on Computational science and its applications - Volume Part III
Enhanced k-means clustering for patient reported outcome

CEA'10 Proceedings of the 4th WSEAS international conference on Computer engineering and applications
Fractional particle swarm optimization in multidimensional search space

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Ant clustering algorithm with K-harmonic means clustering

Expert Systems with Applications: An International Journal
Personalized long-term ECG classification: A systematic approach

Expert Systems with Applications: An International Journal
Value ontology-based multi-aspect intellectual asset valuation method for decision-making support in k-commerce

Expert Systems with Applications: An International Journal
A new-fangled FES-k-Means clustering algorithm for disease discovery and visual analytics

EURASIP Journal on Bioinformatics and Systems Biology
A novel hybrid K-harmonic means and gravitational search algorithm approach for clustering

Expert Systems with Applications: An International Journal
A new hybrid method based on partitioning-based DBSCAN and ant clustering

Expert Systems with Applications: An International Journal
Data clustering based on an efficient hybrid of K-harmonic means, PSO and GA

Transactions on computational collective intelligence IV
Weighted k-means for density-biased clustering

DaWaK'05 Proceedings of the 7th international conference on Data Warehousing and Knowledge Discovery
Cross-people mobile-phone based activity recognition

IJCAI'11 Proceedings of the Twenty-Second international joint conference on Artificial Intelligence - Volume Volume Three
Noise-enhanced clustering and competitive learning algorithms

Neural Networks
Medical image compression based on vector quantization with variable block sizes in wavelet domain

Computational Intelligence and Neuroscience - Special issue on Computational Intelligence in Biomedical Science and Engineering
A circle-based vectorization algorithm for drawings with shadows

Proceedings of the International Symposium on Sketch-Based Interfaces and Modeling
A semi-supervised feature selection method using a non-parametric technique with pairwise instance constraints

Journal of Information Science
A parameter-free barebones particle swarm algorithm for unsupervised pattern classification

International Journal of Hybrid Intelligent Systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

We investigate here the behavior of the standard k-means clustering algorithm and several alternatives to it: the k-harmonic means algorithm due to Zhang and colleagues, fuzzy k-means, Gaussian expectation-maximization, and two new variants of k-harmonic means. Our aim is to find which aspects of these algorithms contribute to finding good clusterings, as opposed to converging to a low-quality local optimum. We describe each algorithm in a unified framework that introduces separate cluster membership and data weight functions. We then show that the algorithms do behave very differently from each other on simple low-dimensional synthetic datasets and image segmentation tasks, and that the k-harmonic means method is superior. Having a soft membership function is essential for finding high-quality clusterings, but having a non-constant data weight function is useful also.