Distributed community detection in web-scale networks

Authors:
Michael Ovelgönne
Affiliations:
UMIACS, University of Maryland, College Park, MD
Venue:
Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Year:
2013

Citing 13
Cited 0

The Strength of Weak Learnability

Machine Learning
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
On clusterings: Good, bad and spectral

Journal of the ACM (JACM)
Group formation in large social networks: membership, growth, and evolution

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
SCAN: a structural clustering algorithm for networks

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
A microscopic view on community detection in complex networks

Proceedings of the 2nd PhD workshop on Information and knowledge management
Parallel community detection on large networks with propinquity dynamics

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Efficient Dense Structure Mining Using MapReduce

ICDMW '09 Proceedings of the 2009 IEEE International Conference on Data Mining Workshops
Design patterns for efficient graph algorithms in MapReduce

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Cluster Cores and Modularity Maximization

ICDMW '10 Proceedings of the 2010 IEEE International Conference on Data Mining Workshops
Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks

Proceedings of the 20th international conference on World wide web

Quantified Score

Hi-index	0.00

Visualization

Abstract

Partitioning large networks into smaller subnetworks (communities) is an important tool to analyze the structure of complex linked systems. In recent years, many in-memory community detection algorithms have been proposed for graphs with millions of edges. Analyzing massive graphs with billions of edges is impossible for existing algorithms. In this contribution, we show how to find community partitions of networks with billions of edges. Our approach is based on an ensemble learning scheme for community detection that provides a way to identify high quality partitions from an ensemble of partitions with lower quality. We present a pre-processing procedure for community detection algorithms that significantly decreases the problem size. After reducing the problem size, traditional non-distributed community detection algorithms can be applied. We implemented a weak but highly scalable label propagation algorithm on top of the distributed-computing framework Apache Hadoop. The evaluation of our implementation on a 50-node Hadoop cluster and with evaluation datasets up to 3.3 billion edges shows very good results with respect to clustering quality as well as scalability. For a smaller 260 million edge network, we show that our preprocessing can improve the results of the popular Louvain modularity clustering algorithm.