A model of computation for MapReduce

Authors:
Howard Karloff;Siddharth Suri;Sergei Vassilvitskii
Affiliations:
AT&T Labs---Research;Yahoo! Research;Yahoo! Research
Venue:
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Year:
2010

Citing 9
Cited 36

A bridging model for parallel computation

Communications of the ACM
Introduction to parallel algorithms and architectures: array, trees, hypercubes

Introduction to parallel algorithms and architectures: array, trees, hypercubes
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Google news personalization: scalable online collaborative filtering

Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
On distributing symmetric streaming computations

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Bounds on multiprocessing anomalies and related packing algorithms

AFIPS '72 (Spring) Proceedings of the May 16-18, 1972, spring joint computer conference
DOULION: counting triangles in massive graphs with a coin

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining

Max-cover in map-reduce

Proceedings of the 19th international conference on World wide web
The declarative imperative: experiences and conjectures in distributed logic

ACM SIGMOD Record
Graph structures and algorithms for query-log analysis

CiE'10 Proceedings of the Programs, proofs, process and 6th international conference on Computability in Europe
Counting triangles and the curse of the last reducer

Proceedings of the 20th international conference on World wide web
Social content matching in MapReduce

Proceedings of the VLDB Endowment
Parallel evaluation of conjunctive queries

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Theory of data stream computing: where to go

Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Filtering: a method for solving graph problems in MapReduce

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
On scheduling in map-reduce and flow-shops

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Fast clustering using MapReduce

Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-preserving access of outsourced data via oblivious RAM simulation

ICALP'11 Proceedings of the 38th international conference on Automata, languages and programming - Volume Part II
DOT: a matrix model for analyzing, optimizing and deploying software for big data analytics in distributed systems

Proceedings of the 2nd ACM Symposium on Cloud Computing
Overlapping clusters for distributed computation

Proceedings of the fifth ACM international conference on Web search and data mining
Riding the elephant: managing ensembles with hadoop

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Densest subgraph in streaming and MapReduce

Proceedings of the VLDB Endowment
Better speedups using simpler parallel programming for graph connectivity and biconnectivity

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Scalable k-means++

Proceedings of the VLDB Endowment
Sorting, searching, and simulation in the mapreduce framework

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
The efficiency of mapreduce in parallel external memory

LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Generate, test, and aggregate: a calculation-based framework for systematic parallel programming with mapreduce

ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Parallel skyline queries

Proceedings of the 15th International Conference on Database Theory
Space-round tradeoffs for MapReduce computations

Proceedings of the 26th ACM international conference on Supercomputing
Towards Trusted Services: Result Verification Schemes for MapReduce

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce

ACM Transactions on Database Systems (TODS)
On modelling and prediction of total CPU usage for applications in mapreduce environments

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Minimal MapReduce algorithms

Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Communication steps for parallel query processing

Proceedings of the 32nd symposium on Principles of database systems
Fast greedy algorithms in mapreduce and streaming

Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Upper and lower bounds on the cost of a map-reduce computation

Proceedings of the VLDB Endowment
SIDR: structure-aware intelligent data routing in Hadoop

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Does RDMA-based enhanced Hadoop MapReduce need a new performance model?

Proceedings of the 4th annual Symposium on Cloud Computing
Querying big social data

BNCOD'13 Proceedings of the 29th British National conference on Big Data
Representing mapreduce optimisations in the nested relational calculus

BNCOD'13 Proceedings of the 29th British National conference on Big Data
MROrder: flexible job ordering optimization for online mapreduce workloads

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Efficient parallel and external matching

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Making queries tractable on big data with preprocessing: through the eyes of complexity theory

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

In recent years the MapReduce framework has emerged as one of the most widely used parallel computing platforms for processing data on terabyte and petabyte scales. Used daily at companies such as Yahoo!, Google, Amazon, and Facebook, and adopted more recently by several universities, it allows for easy parallelization of data intensive computations over many machines. One key feature of MapReduce that differentiates it from previous models of parallel computation is that it interleaves sequential and parallel computation. We propose a model of efficient computation using the MapReduce paradigm. Since MapReduce is designed for computations over massive data sets, our model limits the number of machines and the memory per machine to be substantially sublinear in the size of the input. On the other hand, we place very loose restrictions on the computational power of of any individual machine---our model allows each machine to perform sequential computations in time polynomial in the size of the original input. We compare MapReduce to the PRAM model of computation. We prove a simulation lemma showing that a large class of PRAM algorithms can be efficiently simulated via MapReduce. The strength of MapReduce, however, lies in the fact that it uses both sequential and parallel computation. We demonstrate how algorithms can take advantage of this fact to compute an MST of a dense graph in only two rounds, as opposed to Ω(log(n)) rounds needed in the standard PRAM model. We show how to evaluate a wide class of functions using the MapReduce framework. We conclude by applying this result to show how to compute some basic algorithmic problems such as undirected s-t connectivity in the MapReduce framework.