Combination of in-memory graph computation with mapreduce: a subgraph-centric method of pagerank

Authors:
Qiuhong Li;Wei Wang;Peng Wang;Ke Dai;Zhihui Wang;Yang Wang;Weiwei Sun
Affiliations:
Fudan University, China;Fudan University, China;Fudan University, China;Fudan University, China;Fudan University, China;Fudan University, China;Institute of Computer Science & Technology, Peking University, China
Venue:
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Year:
2013

Citing 7
Cited 0

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Pregel: a system for large-scale graph processing

Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
HaLoop: efficient iterative data processing on large clusters

Proceedings of the VLDB Endowment
Fast personalized PageRank on MapReduce

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data

Quantified Score

Hi-index	0.00

Visualization

Abstract

In order to improve the efficiency of the PageRank algorithm, parallelizing methods, especially the ones based on MapReduce, interest many researchers during the past several years. Previous implementations of the PageRank algorithm on MapReduce ignore the characteristic of locality in distributed systems which is very important to reduce the I/O and network costs. In this paper, we explore the locality property and propose a new method for fast PageRank computation by supporting a subgraph as an input record for map functions. Graph partitioning techniques and a message grouping method are employed to guarantee the efficiency of communication among different subgraphs. Experiments show that our method is significantly more efficient than previous approaches without accuracy loss. The key idea to change the granularity of basic processing units from edges to subgraphs can benefit many other parallelizing algorithms for graph processing.