Accelerate MapReduce on GPUs with multi-level reduction

Authors:
Ran Zheng;Kai Liu;Hai Jin;Qin Zhang;Xiaowen Feng
Affiliations:
Huazhong University of Science and Technology, Wuhan, China;Huazhong University of Science and Technology, Wuhan, China;Huazhong University of Science and Technology, Wuhan, China;Huazhong University of Science and Technology, Wuhan, China;Huazhong University of Science and Technology, Wuhan, China
Venue:
Proceedings of the 5th Asia-Pacific Symposium on Internetware
Year:
2013

Citing 17
Cited 0

GPUTeraSort: high performance graphics co-processor sorting for large database management

Proceedings of the 2006 ACM SIGMOD international conference on Management of data
Evaluating MapReduce for Multi-core and Multiprocessor Systems

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Mars: a MapReduce framework on graphics processors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
CellMR: A framework for supporting mapreduce on asymmetric cell-based clusters

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
MapReduce: a flexible data processing tool

Communications of the ACM - Amir Pnueli: Ahead of His Time
FPMR: MapReduce framework on FPGA

Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Research on k-means Clustering Algorithm: An Improved k-means Clustering Algorithm

IITSI '10 Proceedings of the 2010 Third International Symposium on Intelligent Information Technology and Security Informatics
MapCG: writing parallel program portable between CPU and GPU

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Tiled-MapReduce: optimizing resource usages of data-parallel applications on multicore with tiling

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
CUDA by Example: An Introduction to General-Purpose GPU Programming

CUDA by Example: An Introduction to General-Purpose GPU Programming
Hybrid Map Task Scheduling for GPU-Based Heterogeneous Clusters

CLOUDCOM '10 Proceedings of the 2010 IEEE Second International Conference on Cloud Computing Technology and Science
Hadoop in Action

Hadoop in Action
Using Shared Memory to Accelerate MapReduce on Graphics Processing Units

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Multi-GPU MapReduce on GPU Clusters

IPDPS '11 Proceedings of the 2011 IEEE International Parallel & Distributed Processing Symposium
Optimizing MapReduce for GPUs with effective shared memory usage

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming

Parallel Computing
Grex: An efficient MapReduce framework for graphics processing units

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

With Graphics Processing Units (GPUs) becoming more and more popular in general purpose computing, more attentions have been paid on building a framework to provide convenient interfaces for GPU programming. MapReduce can greatly simplify the programming for data-parallel applications in cloud computing environment, and it is also naturally suitable for GPUs. However, there are some problems in recent reduction-based MapReduce implementation on GPUs. Its performance is dramatically degraded when handling massive distinct keys because the massive data cannot be stored in tiny shared memory entirely. A new MapReduce framework on GPUs, called Jupiter, is proposed with continuous reduction structure. Two improvements are supported in Jupiter, a multi-level reduction scheme tailored for GPU memory hierarchy and a frequency-based cache policy on key-value pairs in shared memory. Shared memories are utilized efficiently for various data-parallel applications whether involving little or abundant distinct keys. Experiments show that Jupiter can achieve up to 3x speedup over the original reduction-based GPU MapReduce framework on the applications with lots of distinct keys.