A cluster ensembles framework

  • Authors:
  • Evgenia Dimitriadou;Andreas Weingessel;Kurt Hornik

  • Affiliations:
  • Institute of Statistics, Technische Universität Vienna, Austria;Institute of Statistics, Wirtschaftsuniversität Vienna, Austria;Institute of Statistics, Wirtschaftsuniversität Vienna, Austria

  • Venue:
  • Design and application of hybrid intelligent systems
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Ensemble methods create solutions to learning problems by constructing a set of individual (different) solutions, and subsequently suitably aggregating these, e.g., by weighted averaging of the predictions in regression, or by taking a weighted vote on the predictions in classification. Such methods, which include Bayesian model averaging, bagging and boosting, have already become very popular for supervised learning problems. For clustering, using ensembles can help to improve the quality and robustness of the results, to re-use existing "knowledge", and to deal with data-distributed situations where not all objects or features are simultaneously available for computations. Aggregation strategies can be based on the idea of minimizing "average" dissimilarity. If only the individual cluster memberships are used, this leads to an optimization problem which in general is computationally hard. For a specific similarity measure which in the crisp case uses overall discordance (modulo relabeling), the characterization of the optimal solution allows the construction of a greedy forward aggregation algorithm ("voting") which performs well on a number of clustering problems. Alternative aggregation strategies can be based on re-clustering the objects according to the rate of co-labeling, or by clustering the collection of memberships of all objects grouped according to the labels. We conclude with an outlook on possible further research on cluster ensembles.