A Load-Driven Task Scheduler with Adaptive DSC for MapReduce
GREENCOM '11 Proceedings of the 2011 IEEE/ACM International Conference on Green Computing and Communications
A Distributed Cache for Hadoop Distributed File System in Real-Time Cloud Services
GRID '12 Proceedings of the 2012 ACM/IEEE 13th International Conference on Grid Computing
Optimizing and Tuning MapReduce Jobs to Improve the Large-Scale Data Analysis Process
International Journal of Intelligent Systems
Cache conscious star-join in MapReduce environments
Proceedings of the 2nd International Workshop on Cloud Intelligence
SHadoop: Improving MapReduce performance by optimizing job execution mechanism in Hadoop clusters
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
MapReduce is a partition-based parallel programming model and framework enabling easy development of scalable parallel programs on clusters of commodity machines. In order to make time-intensive applications benefit from MapReduce on small scale clusters, this paper proposes a new method to improve the performance of MapReduce by using distributed memory cache as a high speed access between map tasks and reduce tasks. Map outputs sent to the distributed memory cache can be gotten by reduce tasks as soon as possible. Experiment results show that our prototype’s performance is much better than that of the original on small scale clusters. To our knowledge, this is the first effort to accelerate MapReduce with the help of distributed memory cache.