MapReduce: simplified data processing on large clusters
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Evaluating MapReduce for Multi-core and Multiprocessor Systems
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Mars: a MapReduce framework on graphics processors
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Phoenix rebirth: Scalable MapReduce on a large-scale shared-memory system
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling
Proceedings of the 5th European conference on Computer systems
Harnessing input redundancy in a MapReduce framework
Proceedings of the 2010 ACM Symposium on Applied Computing
MapReduce for the cell broadband engine architecture
IBM Journal of Research and Development
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Improving MapReduce performance in heterogeneous environments
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Reining in the outliers in map-reduce clusters using Mantri
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Hadoop acceleration through network levitated merge
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Purlieus: locality-aware resource allocation for MapReduce in a cloud
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Using active NVRAM for I/O staging
Proceedings of the 2nd international workshop on Petascal data analytics: challenges and opportunities
Tarazu: optimizing MapReduce on heterogeneous clusters
ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Hi-index | 0.00 |
MapReduce is a popular parallel processing framework for large-scale data analytics. However, it faces a significant performance problem during its data shuffling phase. Our previous work, Hadoop-A, provides a network-levitated merge algorithm with pipelined shuffle/merge/reduce phases for fast data processing. Our further analysis shows that Hadoop-A has a scalability limitation in its memory resource usage for petabyte datasets. In this paper, we propose Hierarchical Merge to reduce the memory buffer usage for Hadoop-A and enable scalable data processing. Our experimental results demonstrate that, while providing memory resource scalability, Hierarchical Merge maintains benefits of Hadoop-A, and improves the execution time by 27% compared to the original Hadoop. Furthermore, Hierarchical Merge reduces disk I/O accesses by as much as 34%.