Clustering with qualitative information

Authors:
Moses Charikar;Venkatesan Guruswami;Anthony Wirth
Affiliations:
Department of Computer Science, Princeton University, 35 Olden St., Princeton, NJ 08544-2087, USA;University of Washington, USA,;Department of Computer Science, Princeton University, 35 Olden St., Princeton, NJ 08544-2087, USA
Venue:
Journal of Computer and System Sciences - Special issue: Learning theory 2003
Year:
2005

Citing 12
Cited 28

Toward Efficient Agnostic Learning

Machine Learning - Special issue on computational learning theory, COLT'92
Rounding algorithms for a geometric embedding of minimum multiway cut

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Approximating Minimum Subset Feedback Sets in Undirected Graphs with Applications

SIAM Journal on Discrete Mathematics
An improved approximation algorithm for MULTIWAY CUT

Journal of Computer and System Sciences - 30th annual ACM symposium on theory of computing
Some optimal inapproximability results

Journal of the ACM (JACM)
On the power of unique 2-prover 1-round games

STOC '02 Proceedings of the thiry-fourth annual ACM symposium on Theory of computing
Approximate Max-Flow Min-(Multi)Cut Theorems and Their Applications

SIAM Journal on Computing
Correlation Clustering

FOCS '02 Proceedings of the 43rd Symposium on Foundations of Computer Science
Improved Approximation Algorithms for MAX k-CUT and MAX BISECTION

Proceedings of the 4th International IPCO Conference on Integer Programming and Combinatorial Optimization
Cluster Graph Modification Problems

WG '02 Revised Papers from the 28th International Workshop on Graph-Theoretic Concepts in Computer Science
Correlation Clustering

Machine Learning
Graph-modeled data clustering: fixed-parameter algorithms for clique generation

CIAC'03 Proceedings of the 5th Italian conference on Algorithms and complexity

Correlation clustering with a fixed number of clusters

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Invitation to data reduction and problem kernelization

ACM SIGACT News
Supervised clustering of streaming data for email batch detection

Proceedings of the 24th international conference on Machine learning
A note on the inapproximability of correlation clustering

Information Processing Letters
Clustering with Partial Information

MFCS '08 Proceedings of the 33rd international symposium on Mathematical Foundations of Computer Science
A Local-Search 2-Approximation for 2-Correlation-Clustering

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Closest 4-leaf power is fixed-parameter tractable

Discrete Applied Mathematics
Information Extraction

Foundations and Trends in Databases
A more effective linear kernelization for cluster editing

Theoretical Computer Science
Efficient top-k count queries over imprecise duplicates

Proceedings of the 12th International Conference on Extending Database Technology: Advances in Database Technology
On the approximability of the Maximum Agreement SubTree and Maximum Compatible Tree problems

Discrete Applied Mathematics
Bounding and comparing methods for correlation clustering beyond ILP

ILP '09 Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing
Framework for evaluating clustering algorithms in duplicate detection

Proceedings of the VLDB Endowment
Clustering with partial information

Theoretical Computer Science
Exact algorithms for cluster editing: evaluation and experiments

WEA'08 Proceedings of the 7th international conference on Experimental algorithms
Untangling the cross-lingual link structure of Wikipedia

ACL '10 Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics
Correlation clustering with noisy input

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
A polynomial time approximation scheme for k-consensus clustering

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
A 2k Kernel for the cluster editing problem

COCOON'10 Proceedings of the 16th annual international conference on Computing and combinatorics
Generalized graph clustering: recognizing (p, q)-cluster graphs

WG'10 Proceedings of the 36th international conference on Graph-theoretic concepts in computer science
Improved approximation algorithms for bipartite correlation clustering

ESA'11 Proceedings of the 19th European conference on Algorithms
A 2k kernel for the cluster editing problem

Journal of Computer and System Sciences
Fitting Tree Metrics: Hierarchical Clustering and Phylogeny

SIAM Journal on Computing
Tight hardness results for minimizing discrepancy

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
The cluster editing problem: implementations and experiments

IWPEC'06 Proceedings of the Second international conference on Parameterized and Exact Computation
A randomized PTAS for the minimum Consensus Clustering with a fixed number of clusters

Theoretical Computer Science
Efficient parameterized preprocessing for cluster editing

FCT'07 Proceedings of the 16th international conference on Fundamentals of Computation Theory
A more effective linear kernelization for Cluster Editing

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the problem of clustering a collection of elements based on pairwise judgments of similarity and dissimilarity. Bansal et al. (in: Proceedings of 43rd FOCS, 2002, pp. 238-247) cast the problem thus: given a graph G whose edges are labeled ''+'' (similar) or ''-'' (dissimilar), partition the vertices into clusters so that the number of pairs correctly (resp., incorrectly) classified with respect to the input labeling is maximized (resp., minimized). It is worthwhile studying both complete graphs, in which every edge is labeled, and general graphs, in which some input edges might not have labels. We answer several questions left open by Bansal et al. (2002) and provide a sound overview of clustering with qualitative information. Specifically, we demonstrate a factor 4 approximation for minimization on complete graphs, and a factor O(logn) approximation for general graphs. For the maximization version, a PTAS for complete graphs was shown by Bansal et al. (2002), we give a factor 0.7664 approximation for general graphs, noting that a PTAS is unlikely by proving APX-hardness. We also prove the APX-hardness of minimization on complete graphs.