Algorithms for clustering data
Algorithms for clustering data
Fast and effective text mining using linear-time document clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Information Retrieval
Cluster validity methods: part I
ACM SIGMOD Record
Performance criteria for graph clustering and Markov cluster experiments
Performance criteria for graph clustering and Markov cluster experiments
Comparing clusterings: an axiomatic view
ICML '05 Proceedings of the 22nd international conference on Machine learning
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
Elements of Information Theory (Wiley Series in Telecommunications and Signal Processing)
K-means clustering versus validation measures: a data distribution perspective
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Model-based evaluation of clustering validation measures
Pattern Recognition
A Generalization of Proximity Functions for K-Means
ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
Pairwise-adaptive dissimilarity measure for document clustering
Information Sciences: an International Journal
Validation of overlapping clustering: A random clustering perspective
Information Sciences: an International Journal
The Journal of Machine Learning Research
Multifocal learning for customer problem analysis
ACM Transactions on Intelligent Systems and Technology (TIST)
An effective evaluation measure for clustering on evolving data streams
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Bayesian nonparametric modeling of user activities
Proceedings of the 2011 international workshop on Trajectory data mining and analysis
A practical approach for clustering transaction data
MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Building a topic hierarchy using the bag-of-related-words representation
Proceedings of the 11th ACM symposium on Document engineering
External evaluation measures for subspace clustering
Proceedings of the 20th ACM international conference on Information and knowledge management
Neurocomputing
DHCC: Divisive hierarchical clustering of categorical data
Data Mining and Knowledge Discovery
Interest-based real-time content recommendation in online social communities
Knowledge-Based Systems
ACM Transactions on Knowledge Discovery from Data (TKDD)
Multi-view clustering using mixture models in subspace projections
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
A comparative study of efficient initialization methods for the k-means clustering algorithm
Expert Systems with Applications: An International Journal
Ranking and selection of unsupervised learning marketing segmentation
Knowledge-Based Systems
Towards information-theoretic K-means clustering for image indexing
Signal Processing
Community structure in interaction web service networks
International Journal of Web Based Communities
Cost-Aware Collaborative Filtering for Travel Tour Recommendations
ACM Transactions on Information Systems (TOIS)
Online fuzzy medoid based clustering algorithms
Neurocomputing
Stock market co-movement assessment using a three-phase clustering method
Expert Systems with Applications: An International Journal
Feature selection for k-means clustering stability: theoretical analysis and an algorithm
Data Mining and Knowledge Discovery
Hi-index | 0.00 |
Clustering validation is a long standing challenge in the clustering literature. While many validation measures have been developed for evaluating the performance of clustering algorithms, these measures often provide inconsistent information about the clustering performance and the best suitable measures to use in practice remain unknown. This paper thus fills this crucial void by giving an organized study of 16 external validation measures for K-means clustering. Specifically, we first introduce the importance of measure normalization in the evaluation of the clustering performance on data with imbalanced class distributions. We also provide normalization solutions for several measures. In addition, we summarize the major properties of these external measures. These properties can serve as the guidance for the selection of validation measures in different application scenarios. Finally, we reveal the interrelationships among these external measures. By mathematical transformation, we show that some validation measures are equivalent. Also, some measures have consistent validation performances. Most importantly, we provide a guide line to select the most suitable validation measures for K-means clustering.