Design patterns for efficient graph algorithms in MapReduce

Authors:
Jimmy Lin;Michael Schatz
Affiliations:
University of Maryland, College Park;University of Maryland, College Park
Venue:
Proceedings of the Eighth Workshop on Mining and Learning with Graphs
Year:
2010

Citing 13
Cited 22

A performance evaluation of four parallel join algorithms in a shared-nothing multiprocessor environment

SIGMOD '89 Proceedings of the 1989 ACM SIGMOD international conference on Management of data
The anatomy of a large-scale hypertextual Web search engine

WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Fully distributed EM for very large datasets

Proceedings of the 25th international conference on Machine learning
CloudBurst

Bioinformatics
Brute force and indexed approaches to pairwise document similarity comparisons with MapReduce

Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval
Fast, easy, and cheap: construction of statistical machine translation models with MapReduce

StatMT '08 Proceedings of the Third Workshop on Statistical Machine Translation
PEGASUS: A Peta-Scale Graph Mining System Implementation and Observations

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
PLANET: massively parallel learning of tree ensembles with MapReduce

Proceedings of the VLDB Endowment
The power of protein interaction networks for associating genes with diseases

Bioinformatics
Data-Intensive Text Processing with MapReduce

Data-Intensive Text Processing with MapReduce

Fast personalized PageRank on MapReduce

Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
Clause-iteration with MapReduce to scalably query datagraphs in the SHARD graph-store

Proceedings of the fourth international workshop on Data-intensive distributed computing
Rapid parallel genome indexing with MapReduce

Proceedings of the second international workshop on MapReduce and its applications
Riding the elephant: managing ensembles with hadoop

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Graffiti: graph-based classification in heterogeneous networks

World Wide Web
Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers

Foundations and Trends® in Machine Learning
iMapReduce: A Distributed Computing Framework for Iterative Computation

Journal of Grid Computing
Parallel rough set based knowledge acquisition using MapReduce from big data

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Subscriber classification within telecom networks utilizing big data technologies and machine learning

Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
Multimedia Applications and Security in MapReduce: Opportunities and Challenges

Concurrency and Computation: Practice & Experience
PARMA: a parallel randomized algorithm for approximate association rules mining in MapReduce

Proceedings of the 21st ACM international conference on Information and knowledge management
A New Electronic Commerce Architecture in the Cloud

Journal of Electronic Commerce in Organizations
Using Pregel-like Large Scale Graph Processing Frameworks for Social Network Analysis

ASONAM '12 Proceedings of the 2012 International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2012)
WTF: the who to follow service at Twitter

Proceedings of the 22nd international conference on World Wide Web
Distributed community detection in web-scale networks

Proceedings of the 2013 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining
Bisimulation reduction of big graphs on mapreduce

BNCOD'13 Proceedings of the 29th British National conference on Big Data
Parallel graph processing on graphics processors made easy

Proceedings of the VLDB Endowment
Parallel processing of large graphs

Future Generation Computer Systems
Random walks based modularity: application to semi-supervised learning

Proceedings of the 23rd international conference on World wide web
Speeding-up codon analysis on the cloud with local MapReduce aggregation

Information Sciences: an International Journal
A comparison of parallel large-scale knowledge acquisition using rough set theory on different MapReduce runtime systems

International Journal of Approximate Reasoning
Scalable community detection in massive social networks using MapReduce

IBM Journal of Research and Development

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphs are analyzed in many important contexts, including ranking search results based on the hyperlink structure of the world wide web, module detection of proteinprotein interaction networks, and privacy analysis of social networks. Many graphs of interest are difficult to analyze because of their large size, often spanning millions of vertices and billions of edges. As such, researchers have increasingly turned to distributed solutions. In particular, MapReduce has emerged as an enabling technology for large-scale graph processing. However, existing best practices for MapReduce graph algorithms have significant shortcomings that limit performance, especially with respect to partitioning, serializing, and distributing the graph. In this paper, we present three design patterns that address these issues and can be used to accelerate a large class of graph algorithms based on message passing, exemplified by PageRank. Experiments show that the application of our design patterns reduces the running time of PageRank on a web graph with 1.4 billion edges by 69%.