The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Discovering large dense subgraphs in massive graphs
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Finding strongly connected components in distributed graphs
Journal of Parallel and Distributed Computing
Using structure indices for efficient approximation of network properties
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Measurement and analysis of online social networks
Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
IEEE Transactions on Knowledge and Data Engineering
User interactions in social networks and their implications
Proceedings of the 4th ACM European conference on Computer systems
Parallel community detection on large networks with propinquity dynamics
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Hadoop: The Definitive Guide
What is Twitter, a social network or a news media?
Proceedings of the 19th international conference on World wide web
Design patterns for efficient graph algorithms in MapReduce
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
X-RIME: Cloud-Based Large Scale Social Network Analysis
SCC '10 Proceedings of the 2010 IEEE International Conference on Services Computing
Computing communities in large networks using random walks
ISCIS'05 Proceedings of the 20th international conference on Computer and Information Sciences
Computer Science Review
Hi-index | 0.00 |
In this paper, we present a community-detection solution for massive-scale social networks using MapReduce, a parallel programming framework. We use a similarity metric to model the community probability, and the model is designed to be parallelizable and scalable in the MapReduce framework. More importantly, we propose a set of degree-based preprocessing and postprocessing techniques named DEPOLD (DElayed Processing of Large Degree nodes) that significantly improve both the community-detection accuracy and performance. With DEPOLD, delaying analysis of 1% of high-degree nodes to the postprocessing stage reduces both processing time and storage space by one order of magnitude. DEPOLD can be applied to other graph-clustering problems. Furthermore, we design and implement two similarity calculation algorithms using MapReduce with different computation and communication characteristics in order to adapt to various system configurations. Finally, we conduct experiments with publicly available datasets. Our evaluation demonstrates the effectiveness, efficiency, and scalability of the proposed solution.