Combining Multiple Weak Clusterings

  • Authors:
  • Alexander Topchy;Anil K. Jain;William Punch

  • Affiliations:
  • -;-;-

  • Venue:
  • ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

A data set can be clustered in many ways dependingon the clustering algorithm employed, parameter settingsused and other factors. Can multiple clusterings becombined so that the final partitioning of data providesbetter clustering? The answer depends on the quality ofclusterings to be combined as well as the properties of thefusion method. First, we introduce a unifiedrepresentation for multiple clusterings and formulate thecorresponding categorical clustering problem. As aresult, we show that the consensus function is related tothe classical intra-class variance criterion using thegeneralized mutual information definition. Second, weshow the efficacy of combining partitions generated byweak clustering algorithms that use data projections andrandom data splits. A simple explanatory model is offeredfor the behavior of combinations of such weak clusteringcomponents. We analyze the combination accuracy as afunction of parameters controlling the power andresolution of component partitions as well as the learningdynamics vs. the number of clusterings involved. Finally,some empirical studies compare the effectiveness ofseveral consensus functions.