Cluster ensemble selection based on relative validity indexes

Authors:
M. C. Naldi;A. C. Carvalho;R. J. Campello
Affiliations:
Federal University of Viçosa-UFV, Rio Paranaíba, Brazil CEP 38.810-000;University of São Paulo-USP, São Carlos, Brazil CEP 13560-970;University of São Paulo-USP, São Carlos, Brazil CEP 13560-970
Venue:
Data Mining and Knowledge Discovery
Year:
2013

Citing 29
Cited 1

Silhouettes: a graphical aid to the interpretation and validation of cluster analysis

Journal of Computational and Applied Mathematics
Multiple comparison procedures

Multiple comparison procedures
Algorithms for clustering data

Algorithms for clustering data
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
On Clustering Validation Techniques

Journal of Intelligent Information Systems
A Supra-Classifier Architecture for Scalable Knowledge Reuse

ICML '98 Proceedings of the Fifteenth International Conference on Machine Learning
Pruning Adaptive Boosting

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data

Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Clustering of diverse genomic data using information fusion

Proceedings of the 2004 ACM symposium on Applied computing
Combining Pattern Classifiers: Methods and Algorithms

Combining Pattern Classifiers: Methods and Algorithms
Ensemble Clustering in Medical Diagnostics

CBMS '04 Proceedings of the 17th IEEE Symposium on Computer-Based Medical Systems
Solving cluster ensemble problems by bipartite graph partitioning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Evolutionary Algorithms for Clustering Gene-Expression Data

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Combining Multiple Clusterings Using Evidence Accumulation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Getting the Most Out of Ensemble Selection

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Moderate diversity for better cluster ensembles

Information Fusion
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters

IEEE Transactions on Pattern Analysis and Machine Intelligence
Least Square Projection: A Fast High-Precision Multidimensional Projection Technique and Its Application to Document Mapping

IEEE Transactions on Visualization and Computer Graphics
Ensemble clustering with voting active clusters

Pattern Recognition Letters
Cluster Ensemble Selection

Statistical Analysis and Data Mining
Adaptive cluster ensemble selection

IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Selecting diversifying heuristics for cluster ensembles

MCS'07 Proceedings of the 7th international conference on Multiple classifier systems
Relative clustering validity criteria: A comparative overview

Statistical Analysis and Data Mining
Efficiency issues of evolutionary k-means

Applied Soft Computing
A Cluster Separation Measure

IEEE Transactions on Pattern Analysis and Machine Intelligence
An Evolutionary Approach to Multiobjective Clustering

IEEE Transactions on Evolutionary Computation
Some new indexes of cluster validity

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Ensembles for unsupervised outlier detection: challenges and research questions a position paper

ACM SIGKDD Explorations Newsletter

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cluster ensemble aims at producing high quality data partitions by combining a set of different partitions produced from the same data. Diversity and quality are claimed to be critical for the selection of the partitions to be combined. To enhance these characteristics, methods can be applied to evaluate and select a subset of the partitions that provide ensemble results similar or better than those based on the full set of partitions. Previous studies have shown that this selection can significantly improve the quality of the final partitions. For such, an appropriate evaluation of the candidate partitions to be combined must be performed. In this work, several methods to evaluate and select partitions are investigated, most of them based on relative clustering validity indexes. These indexes select the partitions with the highest quality to participate in the ensemble. However, each relative index can be more suitable for particular data conformations. Thus, distinct relative indexes are combined to create a final evaluation that tends to be robust to changes in the application scenario, as the majority of the combined indexes may compensate the poor performance of some individual indexes. We also investigate the impact of the diversity among partitions used for the ensemble. A comparative evaluation of results obtained from an extensive collection of experiments involving state-of-the-art methods and statistical tests is presented. Based on the obtained results, a practical design approach is proposed to support cluster ensemble selection. This approach was successfully applied to real public domain data sets.