The Journal of Machine Learning Research
Machine Learning
Correlation clustering with a fixed number of clusters
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Aggregating inconsistent information: Ranking and clustering
Journal of the ACM (JACM)
Correlation Clustering Revisited: The "True" Cost of Error Minimization Problems
ICALP '09 Proceedings of the 36th International Colloquium on Automata, Languages and Programming: Part I
Uncoverning Groups via Heterogeneous Interaction Analysis
ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Computing label-constraint reachability in graph databases
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Graph indexing of road networks for shortest path queries with label restrictions
Proceedings of the VLDB Endowment
Adding regular expressions to graph reachability and pattern queries
ICDE '11 Proceedings of the 2011 IEEE 27th International Conference on Data Engineering
Latent clustering on graphs with multiple edge types
WAW'11 Proceedings of the 8th international conference on Algorithms and models for the web graph
Finding and Characterizing Communities in Multidimensional Networks
ASONAM '11 Proceedings of the 2011 International Conference on Advances in Social Networks Analysis and Mining
Answering label-constraint reachability in large graphs
Proceedings of the 20th ACM international conference on Information and knowledge management
Overlapping Correlation Clustering
ICDM '11 Proceedings of the 2011 IEEE 11th International Conference on Data Mining
Community detection via heterogeneous interaction analysis
Data Mining and Knowledge Discovery
Cascade-based community detection
Proceedings of the sixth ACM international conference on Web search and data mining
Hi-index | 0.00 |
We study a novel clustering problem in which the pairwise relations between objects are categorical. This problem can be viewed as clustering the vertices of a graph whose edges are of different types (colors). We introduce an objective function that aims at partitioning the graph such that the edges within each cluster have, as much as possible, the same color. We show that the problem is NP-hard and propose a randomized algorithm with approximation guarantee proportional to the maximum degree of the input graph. The algorithm iteratively picks a random edge as pivot, builds a cluster around it, and removes the cluster from the graph. Although being fast, easy-to-implement, and parameter free, this algorithm tends to produce a relatively large number of clusters. To overcome this issue we introduce a variant algorithm, which modifies how the pivot is chosen and and how the cluster is built around the pivot. Finally, to address the case where a fixed number of output clusters is required, we devise a third algorithm that directly optimizes the objective function via a strategy based on the alternating minimization paradigm. We test our algorithms on synthetic and real data from the domains of protein-interaction networks, social media, and bibliometrics. Experimental evidence show that our algorithms outperform a baseline algorithm both in the task of reconstructing a ground-truth clustering and in terms of objective function value.