Combining partitions by probabilistic label aggregation

Authors:
Tilman Lange;Joachim M. Buhmann
Affiliations:
ETH Zurich, Zurich, Switzerland;ETH Zurich, Zurich, Switzerland
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 15
Cited 7

Algorithms for clustering data

Algorithms for clustering data
Elements of information theory

Elements of information theory
Bagging predictors

Machine Learning
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
Unsupervised learning by probabilistic latent semantic analysis

Machine Learning
Pattern Recognition and Neural Networks

Pattern Recognition and Neural Networks
Data Clustering Using Evidence Accumulation

ICPR '02 Proceedings of the 16 th International Conference on Pattern Recognition (ICPR'02) Volume 4 - Volume 4
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Bagging for Path-Based Clustering

IEEE Transactions on Pattern Analysis and Machine Intelligence
Combining Multiple Weak Clusterings

ICDM '03 Proceedings of the Third IEEE International Conference on Data Mining
Optimal Cluster Preserving Embedding of Nonmetric Proximity Data

IEEE Transactions on Pattern Analysis and Machine Intelligence
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Analysis of Consensus Partition in Cluster Ensemble

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A framework for statistical clustering with constant time approximation algorithms for K-median and K-means clustering

Machine Learning
Multiobjective data clustering

CVPR'04 Proceedings of the 2004 IEEE computer society conference on Computer vision and pattern recognition

Aggregating time partitions

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
A scalable framework for cluster ensembles

Pattern Recognition
Fragment-based clustering ensembles

Proceedings of the 18th ACM conference on Information and knowledge management
Efficient combination of probabilistic sampling approximations for robust image segmentation

DAGM'06 Proceedings of the 28th conference on Pattern Recognition
Hybrid cluster ensemble framework based on the random combination of data transformation operators

Pattern Recognition
Positional and confidence voting-based consensus functions for fuzzy cluster ensembles

Fuzzy Sets and Systems
From cluster ensemble to structure ensemble

Information Sciences: an International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data clustering represents an important tool in exploratory data analysis. The lack of objective criteria render model selection as well as the identification of robust solutions particularly difficult. The use of a stability assessment and the combination of multiple clustering solutions represents an important ingredient to achieve the goal of finding useful partitions. In this work, we propose a novel way of combining multiple clustering solutions for both, hard and soft partitions: the approach is based on modeling the probability that two objects are grouped together. An efficient EM optimization strategy is employed in order to estimate the model parameters. Our proposal can also be extended in order to emphasize the signal more strongly by weighting individual base clustering solutions according to their consistency with the prediction for previously unseen objects. In addition to that, the probabilistic model supports an out-of-sample extension that (i) makes it possible to assign previously unseen objects to classes of the combined solution and (ii) renders the efficient aggregation of solutions possible. In this work, we also shed some light on the usefulness of such combination approaches. In the experimental result section, we demonstrate the competitive performance of our proposal in comparison with other recently proposed methods for combining multiple classifications of a finite data set.