The efficiency of mapreduce in parallel external memory

Authors:
Gero Greiner;Riko Jacob
Affiliations:
Institute of Theoretical Computer Science, ETH Zurich, Switzerland;Institute of Theoretical Computer Science, ETH Zurich, Switzerland
Venue:
LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Year:
2012

Citing 16
Cited 0

The input/output complexity of sorting and related problems

Communications of the ACM
A bridging model for parallel computation

Communications of the ACM
Truly Efficient Parallel Algorithms: c-Optimal Multisearch for an Extension of the BSP Model (Extended Abstract)

ESA '95 Proceedings of the Third Annual European Symposium on Algorithms
On the limits of cache-obliviousness

Proceedings of the thirty-fifth annual ACM symposium on Theory of computing
MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Fundamental parallel algorithms for private-cache chip multiprocessors

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
MapReduce and parallel DBMSs: friends or foes?

Communications of the ACM - Amir Pnueli: Ahead of His Time
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
Hadoop: The Definitive Guide

Hadoop: The Definitive Guide
On distributing symmetric streaming computations

ACM Transactions on Algorithms (TALG)
Optimal Sparse Matrix Dense Vector Multiplication in the I/O-Model

Theory of Computing Systems - Special Title: Parallelism on Algorithms and Architectures (SPAA); Guest Editors: Cyril Gavoille, Boaz Patt-Shamir and Christian Scheideler
A model of computation for MapReduce

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Evaluating non-square sparse bilinear forms on multiple vector pairs in the I/O-model

MFCS'10 Proceedings of the 35th international conference on Mathematical foundations of computer science
Sorting, searching, and simulation in the mapreduce framework

ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since its introduction in 2004, the MapReduce framework has become one of the standard approaches in massive distributed and parallel computation. In contrast to its intensive use in practise, theoretical footing is still limited and only little work has been done yet to put MapReduce on a par with the major computational models. Following pioneer work that relates the MapReduce framework with PRAM and BSP in their macroscopic structure, we focus on the functionality provided by the framework itself, considered in the parallel external memory model (PEM). In this, we present upper and lower bounds on the parallel I/O-complexity that are matching up to constant factors for the shuffle step. The shuffle step is the single communication phase where all information of one MapReduce invocation gets transferred from map workers to reduce workers. Hence, we move the focus towards the internal communication step in contrast to previous work. The results we obtain further carry over to the BSP* model. On the one hand, this shows how much complexity can be "hidden" for an algorithm expressed in MapReduce compared to PEM. On the other hand, our results bound the worst-case performance loss of the MapReduce approach in terms of I/O-efficiency.