Accelerating MapReduce with Distributed Memory Cache

Authors:
Shubin Zhang;Jizhong Han;Zhiyong Liu;Kai Wang;Shengzhong Feng
Affiliations:
-;-;-;-;-
Venue:
ICPADS '09 Proceedings of the 2009 15th International Conference on Parallel and Distributed Systems
Year:
2009

Citing 0
Cited 5

A Load-Driven Task Scheduler with Adaptive DSC for MapReduce

GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services

GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process

International Journal of Intelligent Systems
Cache conscious star-join in MapReduce environments

Proceedings of the 2nd International Workshop on Cloud Intelligence
SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is a partition-based parallel programming model and framework enabling easy development of scalable parallel programs on clusters of commodity machines. In order to make time-intensive applications benefit from MapReduce on small scale clusters, this paper proposes a new method to improve the performance of MapReduce by using distributed memory cache as a high speed access between map tasks and reduce tasks. Map outputs sent to the distributed memory cache can be gotten by reduce tasks as soon as possible. Experiment results show that our prototype’s performance is much better than that of the original on small scale clusters. To our knowledge, this is the first effort to accelerate MapReduce with the help of distributed memory cache.