Adapting the right measures for K-means clustering

Authors:
Junjie Wu;Hui Xiong;Jian Chen
Affiliations:
Beihang University, Beijing, China;Rutgers University, Newark, NJ, USA;Tsinghua University, Beijing, China
Venue:
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2009

Citing 11
Cited 22

Algorithms for clustering data

Algorithms for clustering data
Fast and effective text mining using linear-time document clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Information Retrieval

Information Retrieval
Cluster validity methods: part I

ACM SIGMOD Record
Performance criteria for graph clustering and Markov cluster experiments

Performance criteria for graph clustering and Markov cluster experiments
Empirical and Theoretical Comparisons of Selected Criterion Functions for Document Clustering

Machine Learning
Comparing clusterings: an axiomatic view

ICML '05 Proceedings of the 22nd international conference on Machine learning
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)

Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
K-means clustering versus validation measures: a data distribution perspective

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Model-based evaluation of clustering validation measures

Pattern Recognition
A Generalization of Proximity Functions for K-Means

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining

Pairwise-adaptive dissimilarity measure for document clustering

Information Sciences: an International Journal
Validation of overlapping clustering: A random clustering perspective

Information Sciences: an International Journal
Information Theoretic Measures for Clusterings Comparison: Variants, Properties, Normalization and Correction for Chance

The Journal of Machine Learning Research
Multifocal learning for customer problem analysis

ACM Transactions on Intelligent Systems and Technology (TIST)
An effective evaluation measure for clustering on evolving data streams

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Bayesian nonparametric modeling of user activities

Proceedings of the 2011 international workshop on Trajectory data mining and analysis
A practical approach for clustering transaction data

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Building a topic hierarchy using the bag-of-related-words representation

Proceedings of the 11th ACM symposium on Document engineering
External evaluation measures for subspace clustering

Proceedings of the 20th ACM international conference on Information and knowledge management
Topic model validation

Neurocomputing
DHCC: Divisive hierarchical clustering of categorical data

Data Mining and Knowledge Discovery
Interest-based real-time content recommendation in online social communities

Knowledge-Based Systems
A Sequential Sampling Framework for Spectral k-Means Based on Efficient Bootstrap Accuracy Estimations: Application to Distributed Clustering

ACM Transactions on Knowledge Discovery from Data (TKDD)
Multi-view clustering using mixture models in subspace projections

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A comparative study of efficient initialization methods for the k-means clustering algorithm

Expert Systems with Applications: An International Journal
Ranking and selection of unsupervised learning marketing segmentation

Knowledge-Based Systems
Towards information-theoretic K-means clustering for image indexing

Signal Processing
Community structure in interaction web service networks

International Journal of Web Based Communities
Cost-Aware Collaborative Filtering for Travel Tour Recommendations

ACM Transactions on Information Systems (TOIS)
Online fuzzy medoid based clustering algorithms

Neurocomputing
Stock market co-movement assessment using a three-phase clustering method

Expert Systems with Applications: An International Journal
Feature selection for k-means clustering stability: theoretical analysis and an algorithm

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering validation is a long standing challenge in the clustering literature. While many validation measures have been developed for evaluating the performance of clustering algorithms, these measures often provide inconsistent information about the clustering performance and the best suitable measures to use in practice remain unknown. This paper thus fills this crucial void by giving an organized study of 16 external validation measures for K-means clustering. Specifically, we first introduce the importance of measure normalization in the evaluation of the clustering performance on data with imbalanced class distributions. We also provide normalization solutions for several measures. In addition, we summarize the major properties of these external measures. These properties can serve as the guidance for the selection of validation measures in different application scenarios. Finally, we reveal the interrelationships among these external measures. By mathematical transformation, we show that some validation measures are equivalent. Also, some measures have consistent validation performances. Most importantly, we provide a guide line to select the most suitable validation measures for K-means clustering.