MARLA: MapReduce for Heterogeneous Clusters

Authors:
Zacharia Fadika;Elif Dede;Jessica Hartog;Madhusudhan Govindaraju
Affiliations:
-;-;-;-
Venue:
CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
Year:
2012

Citing 13
Cited 2

GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
A Dynamic MapReduce Scheduler for Heterogeneous Workloads

GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
Twister: a runtime for iterative MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment

CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
Performance Management of Accelerated MapReduce Workloads in Heterogeneous Clusters

ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
LEMO-MR: Low Overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Adapting MapReduce for HPC environments

Proceedings of the 20th international symposium on High performance distributed computing
DELMA: Dynamically ELastic MapReduce Framework for CPU-Intensive Applications

CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
MARIANE: MApReduce Implementation Adapted for HPC Environments

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Benchmarking MapReduce Implementations for Application Usage Scenarios

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing

Monte Carlo simulation on heterogeneous distributed systems: A computing framework with parallel merging and checkpointing strategies

Future Generation Computer Systems
MapReduce framework energy adaptation via temperature awareness

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce has gradually become the framework of choice for "big data". The MapReduce model allows for efficient and swift processing of large scale data with a cluster of compute nodes. However, the efficiency here comes at a price. The performance of widely used MapReduce implementations such as Hadoop suffers in heterogeneous and load-imbalanced clusters. We show the disparity in performance between homogeneous and heterogeneous clusters in this paper to be high. Subsequently, we present MARLA, a MapReduce framework capable of performing well not only in homogeneous settings, but also when the cluster exhibits heterogeneous properties. We address the problems associated with existing MapReduce implementations affecting cluster heterogeneity, and subsequently present through MARLA the components and trade-offs necessary for better MapReduce performance in heterogeneous cluster and cloud environments. We quantify the performance gains exhibited by our approach against Apache Hadoop and MARIANE in data intensive and compute intensive applications.