Scalable community detection in massive social networks using MapReduce

  • Authors:
  • J. Shi;W. Xue;W. Wang;Y. Zhang;B. Yang;J. Li

  • Affiliations:
  • IBM Research - China, Haidian District, Beijing, China;Tencent, Inc., Haidian District, Beijing, China;Shanghai Synacast Media Tech (PPLive), Inc., PuDong New District, Shanghai, China;Qihoo 360 Technology Company Limited;IBM Software Group, China Development Laboratory, Beijing, China;IBM Research - Austin, Austin, TX

  • Venue:
  • IBM Journal of Research and Development
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a community-detection solution for massive-scale social networks using MapReduce, a parallel programming framework. We use a similarity metric to model the community probability, and the model is designed to be parallelizable and scalable in the MapReduce framework. More importantly, we propose a set of degree-based preprocessing and postprocessing techniques named DEPOLD (DElayed Processing of Large Degree nodes) that significantly improve both the community-detection accuracy and performance. With DEPOLD, delaying analysis of 1% of high-degree nodes to the postprocessing stage reduces both processing time and storage space by one order of magnitude. DEPOLD can be applied to other graph-clustering problems. Furthermore, we design and implement two similarity calculation algorithms using MapReduce with different computation and communication characteristics in order to adapt to various system configurations. Finally, we conduct experiments with publicly available datasets. Our evaluation demonstrates the effectiveness, efficiency, and scalability of the proposed solution.