Hierarchical merge for scalable MapReduce

Authors:
Xinyu Que;Yandong Wang;Cong Xu;Weikuan Yu
Affiliations:
Auburn University, Auburn, USA;Auburn University, Auburn, USA;Auburn University, Auburn, USA;Auburn University, Auburn , USA
Venue:
Proceedings of the 2012 workshop on Management of big data systems
Year:
2012

Citing 15
Cited 0

MapReduce: simplified data processing on large clusters

OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling

Proceedings of the 5th European conference on Computer systems
Harnessing input redundancy in a MapReduce framework

Proceedings of the 2010 ACM Symposium on Applied Computing
MapReduce for the cell broadband engine architecture

IBM Journal of Research and Development
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
MapReduce online

NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Improving MapReduce performance in heterogeneous environments

OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Hadoop acceleration through network levitated merge

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Purlieus: locality-aware resource allocation for MapReduce in a cloud

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Using active NVRAM for I/O staging

Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities
Tarazu: optimizing MapReduce on heterogeneous clusters

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

MapReduce is a popular parallel processing framework for large-scale data analytics. However, it faces a significant performance problem during its data shuffling phase. Our previous work, Hadoop-A, provides a network-levitated merge algorithm with pipelined shuffle/merge/reduce phases for fast data processing. Our further analysis shows that Hadoop-A has a scalability limitation in its memory resource usage for petabyte datasets. In this paper, we propose Hierarchical Merge to reduce the memory buffer usage for Hadoop-A and enable scalable data processing. Our experimental results demonstrate that, while providing memory resource scalability, Hierarchical Merge maintains benefits of Hadoop-A, and improves the execution time by 27% compared to the original Hadoop. Furthermore, Hierarchical Merge reduces disk I/O accesses by as much as 34%.