Fast personalized PageRank on MapReduce

Authors:
Bahman Bahmani;Kaushik Chakrabarti;Dong Xin
Affiliations:
Stanford University, Stanford, CA, USA;Microsoft Research, Redmond, WA, USA;Google Inc., Mountain View, CA, USA
Venue:
Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Year:
2011

Citing 20
Cited 15

What is this page known for? Computing Web page reputations

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Stable algorithms for link analysis

Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Topic-sensitive PageRank

Proceedings of the 11th international conference on World Wide Web
Scaling personalized web search

WWW '03 Proceedings of the 12th international conference on World Wide Web
The link prediction problem for social networks

CIKM '03 Proceedings of the twelfth international conference on Information and knowledge management
A uniform approach to accelerated PageRank computation

WWW '05 Proceedings of the 14th international conference on World Wide Web
To randomize or not to randomize: space optimal summaries for hyperlink analysis

Proceedings of the 15th international conference on World Wide Web
Map-reduce-merge: simplified relational data processing on large clusters

Proceedings of the 2007 ACM SIGMOD international conference on Management of data
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Monte Carlo Methods in PageRank Computation: When One Iteration is Sufficient

SIAM Journal on Numerical Analysis
Dryad: distributed data-parallel programs from sequential building blocks

Proceedings of the 2nd ACM SIGOPS/EuroSys European Conference on Computer Systems 2007
Random walks on the click graph

SIGIR '07 Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval
Combating web spam with trustrank

VLDB '04 Proceedings of the Thirtieth international conference on Very large data bases - Volume 30
Pig latin: a not-so-foreign language for data processing

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Estimating PageRank on graph streams

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
SCOPE: easy and efficient parallel processing of massive data sets

Proceedings of the VLDB Endowment
Graph Twiddling in a MapReduce World

Computing in Science and Engineering
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Design patterns for efficient graph algorithms in MapReduce

Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Data-Intensive Text Processing with MapReduce

Data-Intensive Text Processing with MapReduce

Relational approach for shortest path discovery over large graphs

Proceedings of the VLDB Endowment
Scalable k-means++

Proceedings of the VLDB Endowment
Approximate computation and implicit regularization for very large-scale data analysis

PODS '12 Proceedings of the 31st symposium on Principles of Database Systems
InfoGather: entity augmentation and attribute discovery by holistic matching with web tables

SIGMOD '12 Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
PageRank on an evolving graph

Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
LBSNRank: personalized pagerank on location-based social networks

Proceedings of the 2012 ACM Conference on Ubiquitous Computing
A survey on proximity measures for social networks

Search Computing
Efficient ad-hoc search for personalized PageRank

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Minimal MapReduce algorithms

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Simulation of database-valued markov chains using SimSQL

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
LR-PPR: locality-sensitive, re-use promoting, approximate personalized pagerank computation

Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
DrunkardMob: billions of random walks on just a PC

Proceedings of the 7th ACM conference on Recommender systems
Combination of in-memory graph computation with mapreduce: a subgraph-centric method of pagerank

WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Incremental and accuracy-aware personalized pagerank through scheduled approximation

Proceedings of the VLDB Endowment
On the embeddability of random walk distances

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we design a fast MapReduce algorithm for Monte Carlo approximation of personalized PageRank vectors of all the nodes in a graph. The basic idea is very efficiently doing single random walks of a given length starting at each node in the graph. More precisely, we design a MapReduce algorithm, which given a graph G and a length », outputs a single random walk of length » starting at each node in G. We will show that the number of MapReduce iterations used by our algorithm is optimal among a broad family of algorithms for the problem, and its I/O efficiency is much better than the existing candidates. We will then show how we can use this algorithm to very efficiently approximate all the personalized PageRank vectors. Our empirical evaluation on real-life graph data and in production MapReduce environment shows that our algorithm is significantly more efficient than all the existing algorithms in the MapReduce setting.