Benchmarking MapReduce Implementations for Application Usage Scenarios

Authors:
Zacharia Fadika;Elif Dede;Madhusudhan Govindaraju;Lavanya Ramakrishnan
Affiliations:
-;-;-;-
Venue:
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Year:
2011

Citing 15
Cited 2

The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
MapReduce for Data Intensive Scientific Analyses

ESCIENCE '08 Proceedings of the 2008 Fourth IEEE International Conference on eScience
MRBench: A Benchmark for MapReduce Framework

ICPADS '08 Proceedings of the 2008 14th IEEE International Conference on Parallel and Distributed Systems
NaradaBrokering: a distributed middleware framework and architecture for enabling durable peer-to-peer grids

Proceedings of the ACM/IFIP/USENIX 2003 International Conference on Middleware
A comparison of approaches to large-scale data analysis

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
A Dynamic MapReduce Scheduler for Heterogeneous Workloads

GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
Cogset: A Unified Engine for Reliable Storage and Parallel Processing

NPC '09 Proceedings of the 2009 Sixth IFIP International Conference on Network and Parallel Computing
FPMR: MapReduce framework on FPGA

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
LEMO-MR: Low Overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Cogset vs. Hadoop: Measurements and Analysis

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
DELMA: Dynamically ELastic MapReduce Framework for CPU-Intensive Applications

CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Riding the elephant: managing ensembles with hadoop

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
MARLA: MapReduce for Heterogeneous Clusters

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The MapReduce paradigm provides a scalable model for large scale data-intensive computing and associated fault-tolerance. With data production increasing daily due to ever growing application needs, scientific endeavors, and consumption, the MapReduce model and its implementations need to be further evaluated, improved, and strengthened. Several MapReduce frameworks with various degrees of conformance to the key tenets of the model are available today, each, optimized for specific features. HPC application and middleware developers must thus understand the complex dependencies between MapReduce features and their application. We present a standard benchmark suite for quantifying, comparing, and contrasting the performance of MapReduce platforms under a wide range of representative use cases. We report the performance of three different MapReduce implementations on the benchmarks, and draw conclusions about their current performance characteristics. The three platforms we chose for evaluation are the widely used Apache Hadoop implementation, Twister, which has been discussed in the literature, and LEMO-MR, our own implementation. The performance analysis we perform also throws light on the available design decisions for future implementations, and allows Grid researchers to choose the MapReduce implementation that best suits their application's needs.