Spectral Partitioning of Random Graphs
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Clustering with Qualitative Information
FOCS '03 Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science
Integrating Microarray Data by Consensus Clustering
ICTAI '03 Proceedings of the 15th IEEE International Conference on Tools with Artificial Intelligence
Relationship-based clustering and cluster ensembles for high-dimensional data mining
Relationship-based clustering and cluster ensembles for high-dimensional data mining
Machine Learning
ICDE '05 Proceedings of the 21st International Conference on Data Engineering
Aggregating inconsistent information: ranking and clustering
Proceedings of the thirty-seventh annual ACM symposium on Theory of computing
Fitting tree metrics: Hierarchical clustering and Phylogeny
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Correlation clustering with a fixed number of clusters
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Aggregation of partial rankings, p-ratings and top-m lists
SODA '07 Proceedings of the eighteenth annual ACM-SIAM symposium on Discrete algorithms
On the Approximation of Correlation Clustering and Consensus Clustering
Journal of Computer and System Sciences
Approximate clustering without the approximation
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Improved approximation algorithms for bipartite correlation clustering
ESA'11 Proceedings of the 19th European conference on Algorithms
Chromatic correlation clustering
Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
Hi-index | 0.00 |
Correlation Clustering was defined by Bansal, Blum, and Chawla as the problem of clustering a set of elements based on a, possibly inconsistent, binary similarity function between element pairs. Their setting is agnostic in the sense that a ground truth clustering is not assumed to exist, and the cost of a solution is computed against the input similarity function. This problem has been studied in theory and in practice and has been subsequently proven to be APX-Hard. In this work we assume that there does exist an unknown correct clustering of the data. In this setting, we argue that it is more reasonable to measure the output clustering's accuracy against the unknown underlying true clustering. We present two main results. The first is a novel method for continuously morphing a general (non-metric) function into a pseudometric. This technique may be useful for other metric embedding and clustering problems. The second is a simple algorithm for randomly rounding a pseudometric into a clustering. Combining the two, we obtain a certificate for the possibility of getting a solution of factor strictly less than 2 for our problem. This approximation coefficient could not have been achieved by considering the agnostic version of the problem unless P = NP .