Improving clustering stability with combinatorial MRFs

Authors:
Ron Bekkerman;Martin Scholz;Krishnamurthy Viswanathan
Affiliations:
HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA;HP Labs, Palo Alto, CA, USA
Venue:
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2009

Citing 14
Cited 0

Techniques for measuring the stability of clustering: a comparative study

SIGIR '82 Proceedings of the 5th annual ACM conference on Research and development in information retrieval
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Stability-based validation of clustering solutions

Neural Computation
Solving cluster ensemble problems by bipartite graph partitioning

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Multi-way distributional clustering via pairwise interactions

ICML '05 Proceedings of the 22nd international conference on Machine learning
Resampling Method for Unsupervised Estimation of Cluster Validity

Neural Computation
Multivariate information bottleneck

Neural Computation
Evaluation of Stability of k-Means Cluster Ensembles with Respect to Random Initialization

IEEE Transactions on Pattern Analysis and Machine Intelligence
Consensus Clusterings

ICDM '07 Proceedings of the 2007 Seventh IEEE International Conference on Data Mining
CONSENSUS-BASED ENSEMBLES OF SOFT CLUSTERINGS

Applied Artificial Intelligence
Data weaving: scaling up the state-of-the-art in data clustering

Proceedings of the 17th ACM conference on Information and knowledge management
A sober look at clustering stability

COLT'06 Proceedings of the 19th annual conference on Learning Theory
Clustering and metaclustering with nonnegative matrix decompositions

ECML'05 Proceedings of the 16th European conference on Machine Learning

Quantified Score

Hi-index	0.00

Visualization

Abstract

As clustering methods are often sensitive to parameter tuning, obtaining stability in clustering results is an important task. In this work, we aim at improving clustering stability by attempting to diminish the influence of algorithmic inconsistencies and enhance the signal that comes from the data. We propose a mechanism that takes m clusterings as input and outputs m clusterings of comparable quality, which are in higher agreement with each other. We call our method the Clustering Agreement Process (CAP). To preserve the clustering quality, CAP uses the same optimization procedure as used in clustering. In particular, we study the stability problem of randomized clustering methods (which usually produce different results at each run). We focus on methods that are based on inference in a combinatorial Markov Random Field (or Comraf, for short) of a simple topology. We instantiate CAP as inference within a more complex, bipartite Comraf. We test the resulting system on four datasets, three of which are medium-sized text collections, while the fourth is a large-scale user/movie dataset. First, in all the four cases, our system significantly improves the clustering stability measured in terms of the macro-averaged Jaccard index. Second, in all the four cases our system managed to significantly improve clustering quality as well, achieving the state-of-the-art results. Third, our system significantly improves stability of consensus clustering built on top of the randomized clustering solutions.