Symmetrizations for clustering directed graphs

Authors:
Venu Satuluri;Srinivasan Parthasarathy
Affiliations:
The Ohio State University;The Ohio State University
Venue:
Proceedings of the 14th International Conference on Extending Database Technology
Year:
2011

Citing 15
Cited 7

A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
On power-law relationships of the Internet topology

Proceedings of the conference on Applications, technologies, architectures, and protocols for computer communication
Normalized Cuts and Image Segmentation

IEEE Transactions on Pattern Analysis and Machine Intelligence
On clusterings-good, bad and spectral

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Evaluating similarity measures: a large-scale study in the orkut social network

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Learning from labeled and unlabeled data on a directed graph

ICML '05 Proceedings of the 22nd international conference on Machine learning
Graph mining: Laws, generators, and algorithms

ACM Computing Surveys (CSUR)
Scaling up all pairs similarity search

Proceedings of the 16th international conference on World Wide Web
Measurement and analysis of online social networks

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
Weighted Graph Cuts without Eigenvectors A Multilevel Approach

IEEE Transactions on Pattern Analysis and Machine Intelligence
Introduction to Information Retrieval

Introduction to Information Retrieval
Scalable graph clustering using stochastic flows: applications to community discovery

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Kronecker Graphs: An Approach to Modeling Networks

The Journal of Machine Learning Research
Local partitioning for directed graphs using PageRank

WAW'07 Proceedings of the 5th international conference on Algorithms and models for the web-graph
Web communities identification from random walks

PKDD'06 Proceedings of the 10th European conference on Principle and Practice of Knowledge Discovery in Databases

Local graph sparsification for scalable clustering

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Bayesian locality sensitive hashing for fast similarity search

Proceedings of the VLDB Endowment
Mutual or unrequited love: identifying stable clusters in social networks with uni- and bi-directional links

WAW'12 Proceedings of the 9th international conference on Algorithms and Models for the Web Graph
Discovering factions in the computational linguistics community

ACL '12 Proceedings of the ACL-2012 Special Workshop on Rediscovering 50 Years of Discoveries
Advanced graph mining for community evaluation in social networks and the web

Proceedings of the sixth ACM international conference on Web search and data mining
Near-optimal continuous patrolling with teams of mobile information gathering agents

Artificial Intelligence
Spectral embedding for dynamic social networks

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graph clustering has generally concerned itself with clustering undirected graphs; however the graphs from a number of important domains are essentially directed, e.g. networks of web pages, research papers and Twitter users. This paper investigates various ways of symmetrizing a directed graph into an undirected graph so that previous work on clustering undirected graphs may subsequently be leveraged. Recent work on clustering directed graphs has looked at generalizing objective functions such as conductance to directed graphs and minimizing such objective functions using spectral methods. We show that more meaningful clusters (as measured by an external ground truth criterion) can be obtained by symmetrizing the graph using measures that capture in- and out-link similarity, such as bibliographic coupling and co-citation strength. However, direct application of these similarity measures to modern large-scale power-law networks is problematic because of the presence of hub nodes, which become connected to the vast majority of the network in the transformed undirected graph. We carefully analyze this problem and propose a Degree-discounted similarity measure which is much more suitable for large-scale networks. We show extensive empirical validation.