A cluster ensembles framework

Authors:
Evgenia Dimitriadou;Andreas Weingessel;Kurt Hornik
Affiliations:
Institute of Statistics, Technische Universität Vienna, Austria;Institute of Statistics, Wirtschaftsuniversität Vienna, Austria;Institute of Statistics, Wirtschaftsuniversität Vienna, Austria
Venue:
Design and application of hybrid intelligent systems
Year:
2003

Citing 8
Cited 3

Combinatorial optimization: algorithms and complexity

Combinatorial optimization: algorithms and complexity
Advances in fuzzy integration for pattern recognition

Fuzzy Sets and Systems - Special issue on fuzzy methods for computer vision and pattern recognition
Optimal combinations of pattern classifiers

Pattern Recognition Letters
Combination of Multiple Classifiers Using Local Accuracy Estimates

IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks for Pattern Recognition

Neural Networks for Pattern Recognition
An Empirical Comparison of Voting Classification Algorithms: Bagging, Boosting, and Variants

Machine Learning
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Relationship-based clustering and cluster ensembles for high-dimensional data mining

Relationship-based clustering and cluster ensembles for high-dimensional data mining

Topic Extraction with AGAPE

ADMA '07 Proceedings of the 3rd international conference on Advanced Data Mining and Applications
Agent-Based Non-distributed and Distributed Clustering

MLDM '09 Proceedings of the 6th International Conference on Machine Learning and Data Mining in Pattern Recognition
Ensemble-based classifiers

Artificial Intelligence Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

Ensemble methods create solutions to learning problems by constructing a set of individual (different) solutions, and subsequently suitably aggregating these, e.g., by weighted averaging of the predictions in regression, or by taking a weighted vote on the predictions in classification. Such methods, which include Bayesian model averaging, bagging and boosting, have already become very popular for supervised learning problems. For clustering, using ensembles can help to improve the quality and robustness of the results, to re-use existing "knowledge", and to deal with data-distributed situations where not all objects or features are simultaneously available for computations. Aggregation strategies can be based on the idea of minimizing "average" dissimilarity. If only the individual cluster memberships are used, this leads to an optimization problem which in general is computationally hard. For a specific similarity measure which in the crisp case uses overall discordance (modulo relabeling), the characterization of the optimal solution allows the construction of a greedy forward aggregation algorithm ("voting") which performs well on a number of clustering problems. Alternative aggregation strategies can be based on re-clustering the objects according to the rate of co-labeling, or by clustering the collection of memberships of all objects grouped according to the labels. We conclude with an outlook on possible further research on cluster ensembles.