Scalable community detection in massive social networks using MapReduce

Authors:
J. Shi;W. Xue;W. Wang;Y. Zhang;B. Yang;J. Li
Affiliations:
IBM Research - China, Haidian District, Beijing, China;Tencent, Inc., Haidian District, Beijing, China;Shanghai Synacast Media Tech (PPLive), Inc., PuDong New District, Shanghai, China;Qihoo 360 Technology Company Limited;IBM Software Group, China Development Laboratory, Beijing, China;IBM Research - Austin, Austin, TX
Venue:
IBM Journal of Research and Development
Year:
2013

Citing 15
Cited 0

The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Discovering large dense subgraphs in massive graphs

VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding strongly connected components in distributed graphs

Journal of Parallel and Distributed Computing
Using structure indices for efficient approximation of network properties

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Measurement and analysis of online social networks

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
On Modularity Clustering

IEEE Transactions on Knowledge and Data Engineering
User interactions in social networks and their implications

Proceedings of the 4th ACM European conference on Computer systems
Parallel community detection on large networks with propinquity dynamics

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
What is Twitter, a social network or a news media?

Proceedings of the 19th international conference on World wide web
Design patterns for efficient graph algorithms in MapReduce

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
X-RIME: Cloud-Based Large Scale Social Network Analysis

SCC '10 Proceedings of the 2010 IEEE International Conference on Services Computing
Computing communities in large networks using random walks

ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Survey: Graph clustering

Computer Science Review

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a community-detection solution for massive-scale social networks using MapReduce, a parallel programming framework. We use a similarity metric to model the community probability, and the model is designed to be parallelizable and scalable in the MapReduce framework. More importantly, we propose a set of degree-based preprocessing and postprocessing techniques named DEPOLD (DElayed Processing of Large Degree nodes) that significantly improve both the community-detection accuracy and performance. With DEPOLD, delaying analysis of 1% of high-degree nodes to the postprocessing stage reduces both processing time and storage space by one order of magnitude. DEPOLD can be applied to other graph-clustering problems. Furthermore, we design and implement two similarity calculation algorithms using MapReduce with different computation and communication characteristics in order to adapt to various system configurations. Finally, we conduct experiments with publicly available datasets. Our evaluation demonstrates the effectiveness, efficiency, and scalability of the proposed solution.