Coupled clustering ensemble: Incorporating coupling relationships both between base clusterings and objects

  • Authors:
  • Can Wang;Longbing Cao;Zhong She

  • Affiliations:
  • Advanced Analytics Institute, University of Technology, Sydney, Australia;Advanced Analytics Institute, University of Technology, Sydney, Australia;Advanced Analytics Institute, University of Technology, Sydney, Australia

  • Venue:
  • ICDE '13 Proceedings of the 2013 IEEE International Conference on Data Engineering (ICDE 2013)
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Clustering ensemble is a powerful approach for improving the accuracy and stability of individual (base) clustering algorithms. Most of the existing clustering ensemble methods obtain the final solutions by assuming that base clusterings perform independently with one another and all objects are independent too. However, in real-world data sources, objects are more or less associated in terms of certain coupling relationships. Base clusterings trained on the source data are complementary to one another since each of them may only capture some specific rather than full picture of the data. In this paper, we discuss the problem of explicating the dependency between base clusterings and between objects in clustering ensembles, and propose a framework for coupled clustering ensembles (CCE). CCE not only considers but also integrates the coupling relationships between base clusterings and between objects. Specifically, we involve both the intra-coupling within one base clustering (i.e., cluster label frequency distribution) and the inter-coupling between different base clusterings (i.e., cluster label co-occurrence dependency). Furthermore, we engage both the intra-coupling between two objects in terms of the base clustering aggregation and the inter-coupling among other objects in terms of neighborhood relationship. This is the first work which explicitly addresses the dependency between base clusterings and between objects, verified by the application of such couplings in three types of consensus functions: clustering-based, object-based and cluster-based. Substantial experiments on synthetic and UCI data sets demonstrate that the CCE framework can effectively capture the interactions embedded in base clusterings and objects with higher clustering accuracy and stability compared to several state-of-the-art techniques, which is also supported by statistical analysis.