MARIANE: MApReduce Implementation Adapted for HPC Environments

Authors:
Zacharia Fadika;Elif Dede;Madhusudhan Govindaraju;Lavanya Ramakrishnan
Affiliations:
-;-;-;-
Venue:
GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Year:
2011

Citing 10
Cited 4

GPFS: A Shared-Disk File System for Large Computing Clusters

FAST '02 Proceedings of the Conference on File and Storage Technologies
The Google file system

SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
A Benchmark Suite for SOAP-based Communication in Grid Web Services

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A Table-Driven Streaming XML Parsing Methodology for High-Performance Web Services

ICWS '06 Proceedings of the IEEE International Conference on Web Services
MapReduce: simplified data processing on large clusters

Communications of the ACM - 50th anniversary issue: 1958 - 2008
CLOUDLET: towards mapreduce implementation on virtual machines

Proceedings of the 18th ACM international symposium on High performance distributed computing
The Hadoop Distributed File System

MSST '10 Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST)
LEMO-MR: Low Overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Magellan: experiences from a science cloud

Proceedings of the 2nd international workshop on Scientific cloud computing
DELMA: Dynamically ELastic MapReduce Framework for CPU-Intensive Applications

CCGRID '11 Proceedings of the 2011 11th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing

Evaluating the suitability of mapreduce for surface temperature analysis codes

Proceedings of the second international workshop on Data intensive computing in the clouds
Riding the elephant: managing ensembles with hadoop

Proceedings of the 2011 ACM international workshop on Many task computing on grids and supercomputers
MARLA: MapReduce for Heterogeneous Clusters

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MapReduce framework energy adaptation via temperature awareness

Cluster Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is increasingly becoming a popular framework, and a potent programming model. The most popular open source implementation of MapReduce, Hadoop, is based on the Hadoop Distributed File System (HDFS). However, as HDFS is not POSIX compliant, it cannot be fully leveraged by applications running on a majority of existing HPC environments such as Teragrid and NERSC. These HPC environments typically support globally shared file systems such as NFS and GPFS. On such resourceful HPC infrastructures, the use of Hadoop not only creates compatibility issues, but also affects overall performance due to the added overhead of the HDFS. This paper not only presents a MapReduce implementation directly suitable for HPC environments, but also exposes the design choices for better performance gains in those settings. By leveraging inherent distributed file systems' functions, and abstracting them away from its MapReduce framework, MARIANE (MApReduce Implementation Adapted for HPC Environments) not only allows for the use of the model in an expanding number of HPC environments, but also allows for better performance in such settings. This paper shows the applicability and high performance of the MapReduce paradigm through MARIANE, an implementation designed for clustered and shared-disk file systems and as such not dedicated to a specific MapReduce solution. The paper identifies the components and trade-offs necessary for this model, and quantifies the performance gains exhibited by our approach in distributed environments over Apache Hadoop in a data intensive setting, on the Magellan test bed at the National Energy Research Scientific Computing Center (NERSC).