GPFS: A Shared-Disk File System for Large Computing Clusters
FAST '02 Proceedings of the Conference on File and Storage Technologies
MapReduce: simplified data processing on large clusters
Communications of the ACM - 50th anniversary issue: 1958 - 2008
A Dynamic MapReduce Scheduler for Heterogeneous Workloads
GCC '09 Proceedings of the 2009 Eighth International Conference on Grid and Cooperative Computing
Twister: a runtime for iterative MapReduce
Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
SAMR: A Self-adaptive MapReduce Scheduling Algorithm in Heterogeneous Environment
CIT '10 Proceedings of the 2010 10th IEEE International Conference on Computer and Information Technology
Performance Management of Accelerated MapReduce Workloads in Heterogeneous Clusters
ICPP '10 Proceedings of the 2010 39th International Conference on Parallel Processing
The Hadoop Distributed File System
MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Adapting MapReduce for HPC environments
Proceedings of the 20th international symposium on High performance distributed computing
DELMA: Dynamically ELastic MapReduce Framework for CPU-Intensive Applications
CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing
MARIANE: MApReduce Implementation Adapted for HPC Environments
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Benchmarking MapReduce Implementations for Application Usage Scenarios
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Future Generation Computer Systems
MapReduce framework energy adaptation via temperature awareness
Cluster Computing
Hi-index | 0.00 |
MapReduce has gradually become the framework of choice for "big data". The MapReduce model allows for efficient and swift processing of large scale data with a cluster of compute nodes. However, the efficiency here comes at a price. The performance of widely used MapReduce implementations such as Hadoop suffers in heterogeneous and load-imbalanced clusters. We show the disparity in performance between homogeneous and heterogeneous clusters in this paper to be high. Subsequently, we present MARLA, a MapReduce framework capable of performing well not only in homogeneous settings, but also when the cluster exhibits heterogeneous properties. We address the problems associated with existing MapReduce implementations affecting cluster heterogeneity, and subsequently present through MARLA the components and trade-offs necessary for better MapReduce performance in heterogeneous cluster and cloud environments. We quantify the performance gains exhibited by our approach against Apache Hadoop and MARIANE in data intensive and compute intensive applications.