Consensus clustering

  • Authors:
  • Tianming Hu;Sam Yuan Sung

  • Affiliations:
  • Department of Computer Science, DongGuan University of Technology 1 University Road, SongShan Lake District, GuangDong 523808, China;Department of Computer Science, National University of Singapore, Singapore 117543

  • Venue:
  • Intelligent Data Analysis
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

We address the consensus clustering problem of combining multiple partitions of a set of objects into a single consolidated partition. The input here is a set of cluster labelings and we do not access the original data or clustering algorithms that determine these partitions. After introducing the distribution-based view of partitions, we propose a series of entropy-based distance functions for comparing various partitions. Given a candidate partition set, consensus clustering is then formalized as an optimization problem of searching for a centroid partition with the smallest distance to that set. In addition to directly selecting the local centroid candidate, we also present two combining methods based on similarity-based graph partitioning. Under certain conditions, the centroid partition is likely to be top/middle-ranked in terms of closeness to the true partition. Finally we evaluate its effectiveness on both artificial and real datasets, with candidates from either the full space or the subspace.