LEMO-MR: Low Overhead and Elastic MapReduce Implementation Optimized for Memory and CPU-Intensive Applications

Authors:
Zacharia Fadika;Madhusudhan Govindaraju
Affiliations:
-;-
Venue:
CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Year:
2010

Citing 0
Cited 6

MARIANE: MApReduce Implementation Adapted for HPC Environments

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Benchmarking MapReduce Implementations for Application Usage Scenarios

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
Scalable and Distributed Processing of Scientific XML Data

GRID '11 Proceedings of the 2011 IEEE/ACM 12th International Conference on Grid Computing
MARLA: MapReduce for Heterogeneous Clusters

CCGRID '12 Proceedings of the 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012)
MRBS: towards dependability benchmarking for hadoop mapreduce

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Performance evaluation of a MongoDB and hadoop platform for scientific data analysis

Proceedings of the 4th ACM workshop on Scientific cloud computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Since its inception, MapReduce has frequently been associated with Hadoop and large-scale datasets. Its deployment at Amazon in the cloud, and its applications at Yahoo! and Face book for large-scale distributed document indexing and database building, among other tasks, have thrust MapReduce to the forefront of the data processing application domain. The applicability of the paradigm however extends far beyond its use with data intensive applications and disk based systems, and can also be brought to bear in processing small but CPU intensive distributed applications. In this work, we focus both on the performance of processing large-scale hierarchical data in distributed scientific applications, as well as the processing of smaller but demanding input sizes primarily used in diskless, and memory resident I/O systems. In this paper, we present LEMO-MR (Low overhead, Elastic, configurable for in-Memory applications, and on-Demand fault tolerance), an optimized implementation of MapReduce, for both on-disk and in-memory applications, describe its architecture and identify not only the necessary components of this model, but also trade offs and factors to be considered. We show the efficacy of our implementation in terms of potential speedup that can be achieved for representative data sets used by cloud applications. Finally, we quantify the performance gains exhibited by our MapReduce implementation over Apache Hadoop in a compute intensive environment.