Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

Authors:
Ludmila I. Kuncheva;Dmitry P. Vetrov
Affiliations:
IEEE;-
Venue:
IEEE Transactions on Pattern Analysis and Machine Intelligence
Year:
2006

Citing 19
Cited 33

Algorithms for clustering data

Algorithms for clustering data
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Performance Evaluation of Some Clustering Algorithms and Validity Indices

IEEE Transactions on Pattern Analysis and Machine Intelligence
Multiclassifier Systems: Back to the Future

MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Stability and generalization

The Journal of Machine Learning Research
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Bagging for Path-Based Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining Multiple Weak Clusterings

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Pattern Classification (2nd Edition)

Pattern Classification (2nd Edition)
Cluster ensemble and its applications in gene expression analysis

APBC '04 Proceedings of the second conference on Asia-Pacific bioinformatics - Volume 29
Ensembles of Partitions via Data Resampling

ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Stability-based validation of clustering solutions

Neural Computation
Stability of Randomized Learning Algorithms

The Journal of Machine Learning Research
Combining Multiple Clusterings Using Evidence Accumulation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Resampling Method for Unsupervised Estimation of Cluster Validity

Neural Computation
ROC curves and video analysis optimization in intestinal capsule endoscopy

Pattern Recognition Letters - Special issue: ROC analysis in pattern recognition
Moderate diversity for better cluster ensembles

Information Fusion
Finding natural clusters using multi-clusterer combiner based on shared nearest neighbors

MCS'03 Proceedings of the 4th international conference on Multiple classifier systems

2008 Special Issue: Interactive data analysis and clustering of genomic data

Neural Networks
Unsupervised video shot detection using clustering ensemble with a color global scale-invariant feature transform descriptor

Journal on Image and Video Processing - Color in Image and Video Processing
Kernel k-Means Clustering Applied to Vector Space Embeddings of Graphs

ANNPR '08 Proceedings of the 3rd IAPR workshop on Artificial Neural Networks in Pattern Recognition
Using Global Optimization to Explore Multiple Solutions of Clustering Problems

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
Robust Clustering by Aggregation and Intersection Methods

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
Refining Pairwise Similarity Matrix for Cluster Ensemble Problem with Cluster Relations

DS '08 Proceedings of the 11th International Conference on Discovery Science
Unsupervised Video Shot Segmentation Using Global Color and Texture Information

ISVC '08 Proceedings of the 4th International Symposium on Advances in Visual Computing
Fuzzy ensemble clustering based on random projections for DNA microarray data analysis

Artificial Intelligence in Medicine
Data dependency in multiple classifier systems

Pattern Recognition
Improving clustering stability with combinatorial MRFs

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Interactive Visualization Tools for Meta-Clustering

Proceedings of the 2009 conference on New Directions in Neural Networks: 18th Italian Workshop on Neural Networks: WIRN 2008
An Experimental Validation of Some Indexes of Fuzzy Clustering Similarity

WILF '09 Proceedings of the 8th International Workshop on Fuzzy Logic and Applications
Metaclustering and Consensus Algorithms for Interactive Data Analysis and Validation

WILF '09 Proceedings of the 8th International Workshop on Fuzzy Logic and Applications
When Semi-supervised Learning Meets Ensemble Learning

MCS '09 Proceedings of the 8th International Workshop on Multiple Classifier Systems
Stability and Performances in Biclustering Algorithms

Computational Intelligence Methods for Bioinformatics and Biostatistics
Cluster-based genetic segmentation of time series with DWT

Pattern Recognition Letters
Iterative Bayesian fuzzy clustering toward flexible icon-based assistive software for the disabled

Information Sciences: an International Journal
Comparing hard and fuzzy c-means for evidence-accumulation clustering

FUZZ-IEEE'09 Proceedings of the 18th international conference on Fuzzy Systems
Application notes: data mining in cancer research

IEEE Computational Intelligence Magazine
Robust clustering using discriminant analysis

ICDM'10 Proceedings of the 10th industrial conference on Advances in data mining: applications and theoretical aspects
Nonparametric Bayesian clustering ensembles

ECML PKDD'10 Proceedings of the 2010 European conference on Machine learning and knowledge discovery in databases: Part III
Greedy optimization classifiers ensemble based on diversity

Pattern Recognition
Tuning graded possibilistic clustering by visual stability analysis

WILF'11 Proceedings of the 9th international conference on Fuzzy logic and applications
Hybrid cluster ensemble framework based on the random combination of data transformation operators

Pattern Recognition
Generalized Adjusted Rand Indices for cluster ensembles

Pattern Recognition
From cluster ensemble to structure ensemble

Information Sciences: an International Journal
A New Unsupervised Feature Ranking Method for Gene Expression Data Based on Consensus Affinity

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Generation of a clustering ensemble based on a gravitational self-organising map

Neurocomputing
SC³: Triple Spectral Clustering-Based Consensus Clustering Framework for Class Discovery from Cancer Gene Expression Profiles

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Provenance for data mining

TaPP'13 Proceedings of the 5th USENIX conference on Theory and Practice of Provenance
Provenance for data mining

Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance
A semi-supervised feature selection method using a non-parametric technique with pairwise instance constraints

Journal of Information Science
Pairwise similarity for cluster ensemble problem: link-based and approximate approaches

Transactions on Large-Scale Data- and Knowledge-centered systems IX

Quantified Score

Hi-index	0.15

Visualization

Abstract

Many clustering algorithms, including cluster ensembles, rely on a random component. Stability of the results across different runs is considered to be an asset of the algorithm. The cluster ensembles considered here are based on k-means clusterers. Each clusterer is assigned a random target number of clusters, k and is started from a random initialization. Here, we use 10 artificial and 10 real data sets to study ensemble stability with respect to random k, and random initialization. The data sets were chosen to have a small number of clusters (two to seven) and a moderate number of data points (up to a few hundred). Pairwise stability is defined as the adjusted Rand index between pairs of clusterers in the ensemble, averaged across all pairs. Nonpairwise stability is defined as the entropy of the consensus matrix of the ensemble. An experimental comparison with the stability of the standard k-means algorithm was carried out for k from 2 to 20. The results revealed that ensembles are generally more stable, markedly so for larger k. To establish whether stability can serve as a cluster validity index, we first looked at the relationship between stability and accuracy with respect to the number of clusters, k. We found that such a relationship strongly depends on the data set, varying from almost perfect positive correlation (0.97, for the glass data) to almost perfect negative correlation (-0.93, for the crabs data). We propose a new combined stability index to be the sum of the pairwise individual and ensemble stabilities. This index was found to correlate better with the ensemble accuracy. Following the hypothesis that a point of stability of a clustering algorithm corresponds to a structure found in the data, we used the stability measures to pick the number of clusters. The combined stability index gave best results.