A bridging model for parallel computation
Communications of the ACM
Introduction to parallel algorithms and architectures: array, trees, hypercubes
Introduction to parallel algorithms and architectures: array, trees, hypercubes
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Google news personalization: scalable online collaborative filtering
Proceedings of the 16th international conference on World Wide Web
MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
On distributing symmetric streaming computations
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Bounds on multiprocessing anomalies and related packing algorithms
AFIPS '72 (Spring) Proceedings of the May 16-18, 1972, spring joint computer conference
DOULION: counting triangles in massive graphs with a coin
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Proceedings of the 19th international conference on World wide web
Graph structures and algorithms for query-log analysis
CiE'10 Proceedings of the Programs, proofs, process and 6th international conference on Computability in Europe
Counting triangles and the curse of the last reducer
Proceedings of the 20th international conference on World wide web
Social content matching in MapReduce
Proceedings of the VLDB Endowment
Parallel evaluation of conjunctive queries
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Theory of data stream computing: where to go
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Filtering: a method for solving graph problems in MapReduce
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
On scheduling in map-reduce and flow-shops
Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Fast clustering using MapReduce
Proceedings of the 17th ACM SIGKDD international conference on Knowledge discovery and data mining
Privacy-preserving access of outsourced data via oblivious RAM simulation
ICALP'11 Proceedings of the 38th international conference on Automata, languages and programming - Volume Part II
Proceedings of the 2nd ACM Symposium on Cloud Computing
Overlapping clusters for distributed computation
Proceedings of the fifth ACM international conference on Web search and data mining
Riding the elephant: managing ensembles with hadoop
Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
Densest subgraph in streaming and MapReduce
Proceedings of the VLDB Endowment
Better speedups using simpler parallel programming for graph connectivity and biconnectivity
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Proceedings of the VLDB Endowment
Sorting, searching, and simulation in the mapreduce framework
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
The efficiency of mapreduce in parallel external memory
LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
ESOP'12 Proceedings of the 21st European conference on Programming Languages and Systems
Proceedings of the 15th International Conference on Database Theory
Space-round tradeoffs for MapReduce computations
Proceedings of the 26th ACM international conference on Supercomputing
Towards Trusted Services: Result Verification Schemes for MapReduce
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
SCALLA: A Platform for Scalable One-Pass Analytics Using MapReduce
ACM Transactions on Database Systems (TODS)
On modelling and prediction of total CPU usage for applications in mapreduce environments
ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data
Communication steps for parallel query processing
Proceedings of the 32nd symposium on Principles of database systems
Fast greedy algorithms in mapreduce and streaming
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Upper and lower bounds on the cost of a map-reduce computation
Proceedings of the VLDB Endowment
SIDR: structure-aware intelligent data routing in Hadoop
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Does RDMA-based enhanced Hadoop MapReduce need a new performance model?
Proceedings of the 4th annual Symposium on Cloud Computing
BNCOD'13 Proceedings of the 29th British National conference on Big Data
Representing mapreduce optimisations in the nested relational calculus
BNCOD'13 Proceedings of the 29th British National conference on Big Data
MROrder: flexible job ordering optimization for online mapreduce workloads
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Efficient parallel and external matching
Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Making queries tractable on big data with preprocessing: through the eyes of complexity theory
Proceedings of the VLDB Endowment
Hi-index | 0.00 |
In recent years the MapReduce framework has emerged as one of the most widely used parallel computing platforms for processing data on terabyte and petabyte scales. Used daily at companies such as Yahoo!, Google, Amazon, and Facebook, and adopted more recently by several universities, it allows for easy parallelization of data intensive computations over many machines. One key feature of MapReduce that differentiates it from previous models of parallel computation is that it interleaves sequential and parallel computation. We propose a model of efficient computation using the MapReduce paradigm. Since MapReduce is designed for computations over massive data sets, our model limits the number of machines and the memory per machine to be substantially sublinear in the size of the input. On the other hand, we place very loose restrictions on the computational power of of any individual machine---our model allows each machine to perform sequential computations in time polynomial in the size of the original input. We compare MapReduce to the PRAM model of computation. We prove a simulation lemma showing that a large class of PRAM algorithms can be efficiently simulated via MapReduce. The strength of MapReduce, however, lies in the fact that it uses both sequential and parallel computation. We demonstrate how algorithms can take advantage of this fact to compute an MST of a dense graph in only two rounds, as opposed to Ω(log(n)) rounds needed in the standard PRAM model. We show how to evaluate a wide class of functions using the MapReduce framework. We conclude by applying this result to show how to compute some basic algorithmic problems such as undirected s-t connectivity in the MapReduce framework.