Hierarchical merge for scalable MapReduce

  • Authors:
  • Xinyu Que;Yandong Wang;Cong Xu;Weikuan Yu

  • Affiliations:
  • Auburn University, Auburn, USA;Auburn University, Auburn, USA;Auburn University, Auburn, USA;Auburn University, Auburn , USA

  • Venue:
  • Proceedings of the 2012 workshop on Management of big data systems
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

MapReduce is a popular parallel processing framework for large-scale data analytics. However, it faces a significant performance problem during its data shuffling phase. Our previous work, Hadoop-A, provides a network-levitated merge algorithm with pipelined shuffle/merge/reduce phases for fast data processing. Our further analysis shows that Hadoop-A has a scalability limitation in its memory resource usage for petabyte datasets. In this paper, we propose Hierarchical Merge to reduce the memory buffer usage for Hadoop-A and enable scalable data processing. Our experimental results demonstrate that, while providing memory resource scalability, Hierarchical Merge maintains benefits of Hadoop-A, and improves the execution time by 27% compared to the original Hadoop. Furthermore, Hierarchical Merge reduces disk I/O accesses by as much as 34%.