On voting-based consensus of cluster ensembles

Authors:
Hanan G. Ayad;Mohamed S. Kamel
Affiliations:
Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1;Department of Electrical and Computer Engineering, University of Waterloo, Waterloo, Ontario, Canada N2L 3G1
Venue:
Pattern Recognition
Year:
2010

Citing 15
Cited 13

Algorithms for clustering data

Algorithms for clustering data
Elements of information theory

Elements of information theory
Data clustering and learning

The handbook of brain theory and neural networks
Data clustering: a review

ACM Computing Surveys (CSUR)
Pattern Recognition with Fuzzy Objective Function Algorithms

Pattern Recognition with Fuzzy Objective Function Algorithms
Clustering Algorithms

Clustering Algorithms
A Probabilistic Classification System for Predicting the Cellular Localization Sites of Proteins

Proceedings of the Fourth International Conference on Intelligent Systems for Molecular Biology
Information-theoretical methods in clustering

Information-theoretical methods in clustering
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Bagging for Path-Based Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining Multiple Weak Clusterings

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Analysis of Consensus Partition in Cluster Ensemble

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Combining Multiple Clusterings Using Evidence Accumulation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Clustering Ensembles: Models of Consensus and Weak Partitions

IEEE Transactions on Pattern Analysis and Machine Intelligence
Cumulative Voting Consensus Method for Partitions with Variable Number of Clusters

IEEE Transactions on Pattern Analysis and Machine Intelligence

Fast agglomerative clustering using information of k-nearest neighbors

Pattern Recognition
Combining multiple clusterings using similarity graph

Pattern Recognition
DICLENS: Divisive Clustering Ensemble with Automatic Cluster Number

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
An effective ensemble method for hierarchical clustering

Proceedings of the Fifth International C* Conference on Computer Science and Software Engineering
Semi-supervised clustering ensemble based on multi-ant colonies algorithm

RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Semi-supervised clustering ensemble based on collaborative training

RSKT'12 Proceedings of the 7th international conference on Rough Sets and Knowledge Technology
Combining multiple clusterings of chemical structures using cumulative voting-based aggregation algorithm

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part II
Adaptive cumulative voting-based aggregation algorithm for combining multiple clusterings of chemical structures

ACIIDS'13 Proceedings of the 5th Asian conference on Intelligent Information and Database Systems - Volume Part II
Least square consensus clustering: criteria, methods, experiments

ECIR'13 Proceedings of the 35th European conference on Advances in Information Retrieval
A hierarchical clusterer ensemble method based on boosting theory

Knowledge-Based Systems
A Lattice-Computing ensemble for reasoning based on formal fusion of disparate data types, and an industrial dispensing application

Information Fusion
Agreement-based fuzzy C-means for clustering data with blocks of features

Neurocomputing
Ensemble canonical correlation analysis

Applied Intelligence

Quantified Score

Hi-index	0.01

Visualization

Abstract

Voting-based consensus clustering refers to a distinct class of consensus methods in which the cluster label mismatch problem is explicitly addressed. The voting problem is defined as the problem of finding the optimal relabeling of a given partition with respect to a reference partition. It is commonly formulated as a weighted bipartite matching problem. In this paper, we present a more general formulation of the voting problem as a regression problem with multiple-response and multiple-input variables. We show that a recently introduced cumulative voting scheme is a special case corresponding to a linear regression method. We use a randomized ensemble generation technique, where an overproduced number of clusters is randomly selected for each ensemble partition. We apply an information theoretic algorithm for extracting the consensus clustering from the aggregated ensemble representation and for estimating the number of clusters. We apply it in conjunction with bipartite matching and cumulative voting. We present empirical evidence showing substantial improvements in clustering accuracy, stability, and estimation of the true number of clusters based on cumulative voting. The improvements are achieved in comparison to consensus algorithms based on bipartite matching, which perform very poorly with the chosen ensemble generation technique, and also to other recent consensus algorithms.