CooMR: cross-task coordination for efficient data management in MapReduce programs

  • Authors:
  • Xiaobing Li;Yandong Wang;Yizheng Jiao;Cong Xu;Weikuan Yu

  • Affiliations:
  • Auburn University, AL;Auburn University, AL;Auburn University, AL;Auburn University, AL;Auburn University, AL

  • Venue:
  • SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Hadoop is a widely adopted open source implementation of MapReduce programming model for big data processing. It represents system resources as available map and reduce slots and assigns them to various tasks. This execution model gives little regard to the need of cross-task coordination on the use of shared system resources on a compute node, which results in task interference. In addition, the existing Hadoop merge algorithm can cause excessive I/O. In this study, we undertake an effort to address both issues. Accordingly, we have designed a cross-task coordination framework called CooMR for efficient data management in MapReduce programs. CooMR consists of three component schemes including cross-task opportunistic memory sharing and log-structured I/O consolidation, which are designed to facilitate task coordination, and the key-based in-situ merge (KISM) algorithm which is designed to enable the sorting/merging of Hadoop intermediate data without actually moving the pairs. Our evaluation demonstrates that CooMR is able to increase task coordination, improve system resource utilization, and significantly speed up the execution time of MapReduce programs.