Consensus clustering

Authors:
Tianming Hu;Sam Yuan Sung
Affiliations:
Department of Computer Science, DongGuan University of Technology 1 University Road, SongShan Lake District, GuangDong 523808, China;Department of Computer Science, National University of Singapore, Singapore 117543
Venue:
Intelligent Data Analysis
Year:
2005

Citing 16
Cited 1

A theory of the learnable

Communications of the ACM
Algorithms for clustering data

Algorithms for clustering data
The Strength of Weak Learnability

Machine Learning
Multilevel hypergraph partitioning: application in VLSI domain

DAC '97 Proceedings of the 34th annual Design Automation Conference
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
Data clustering: a review

ACM Computing Surveys (CSUR)
Distributed clustering using collective principal component analysis

Knowledge and Information Systems
Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems

Combining Artificial Neural Nets: Ensemble and Modular Multi-Net Systems
Theory of Information and Coding

Theory of Information and Coding
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Techniques of Cluster Algorithms in Data Mining

Data Mining and Knowledge Discovery
A Multi-clustering Fusion Algorithm

SETN '02 Proceedings of the Second Hellenic Conference on AI: Methods and Applications of Artificial Intelligence
Evidence Accumulation Clustering Based on the K-Means Algorithm

Proceedings of the Joint IAPR International Workshop on Structural, Syntactic, and Statistical Pattern Recognition
Multiclassifier Systems: Back to the Future

MCS '02 Proceedings of the Third International Workshop on Multiple Classifier Systems
A clustering method based on boosting

Pattern Recognition Letters
Iterative optimization and simplification of hierarchical clusterings

Journal of Artificial Intelligence Research

Community detection via heterogeneous interaction analysis

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

We address the consensus clustering problem of combining multiple partitions of a set of objects into a single consolidated partition. The input here is a set of cluster labelings and we do not access the original data or clustering algorithms that determine these partitions. After introducing the distribution-based view of partitions, we propose a series of entropy-based distance functions for comparing various partitions. Given a candidate partition set, consensus clustering is then formalized as an optimization problem of searching for a centroid partition with the smallest distance to that set. In addition to directly selecting the local centroid candidate, we also present two combining methods based on similarity-based graph partitioning. Under certain conditions, the centroid partition is likely to be top/middle-ranked in terms of closeness to the true partition. Finally we evaluate its effectiveness on both artificial and real datasets, with candidates from either the full space or the subspace.