On constructing an optimal consensus clustering from multiple clusterings

Authors:
Piotr Berman;Bhaskar DasGupta;Ming-Yang Kao;Jie Wang
Affiliations:
Department of Computer Science & Engineering, Pennsylvania State University, University Park, PA 16802, USA;Department of Computer Science, University of Illinois at Chicago, Chicago, IL 60607, USA;Department of Electrical Engineering & Computer Science, Northwestern University, Evanston, IL 60208, USA;Department of Computer Science, University of Massachusetts Lowell, Lowell, MA 01854, USA
Venue:
Information Processing Letters
Year:
2007

Citing 2
Cited 5

On the complexity of approximating the independent set problem

Information and Computation
Proof verification and the hardness of approximation problems

Journal of the ACM (JACM)

Robust Clustering by Aggregation and Intersection Methods

KES '08 Proceedings of the 12th international conference on Knowledge-Based Intelligent Information and Engineering Systems, Part III
A Fast Approximation Algorithm for the k Partition-Distance Problem

ICCSA '09 Proceedings of the International Conference on Computational Science and Its Applications: Part II
Multiple data structure discovery through global optimisation, meta clustering and consensus methods

International Journal of Knowledge Engineering and Soft Data Paradigms
A novel hierarchical-clustering-combination scheme based on fuzzy-similarity relations

IEEE Transactions on Fuzzy Systems
An efficient algorithm for computing the distance between close partitions

Discrete Applied Mathematics

Quantified Score

Hi-index	0.89

Visualization

Abstract

Computing a suitable measure of consensus among several clusterings on the same data is an important problem that arises in several areas such as computational biology and data mining. In this paper, we formalize a set-theoretic model for computing such a similarity measure. Roughly speaking, in this model we have k 1 partitions (clusters) of the same data set each containing the same number of sets and the goal is to align the sets in each partition to minimize a similarity measure. For k=2, a polynomial-time solution was proposed by Gusfield (Information Processing Letters 82 (2002) 159-164). In this paper, we show that the problem is MAX-SNP-hard for k=3 even if each partition in each cluster contains no more than 2 elements and provide a -approximation algorithm for the problem for any k.