APR: A Novel Parallel Repacking Algorithm for Efficient GPGPU Parallel Code Transformation

  • Authors:
  • Yulong Yu;Xubin He;He Guo;Sihui Zhong;Yuxin Wang;Xin Chen;Weijun Xiao

  • Affiliations:
  • School of Software Technology, Dalian University of Technology, Dalian, China;Department of Electrical and Computer Engineering, Virginia Commonwealth University Richmond, VA, USA;School of Software Technology, Dalian University of Technology, Dalian, China;School of Software Technology, Dalian University of Technology, Dalian, China;School of Computer Science and Technology, Dalian University of Technology, Dalian, China;School of Software Technology, Dalian University of Technology, Dalian, China;Department of Electrical and Computer Engineering, Virginia Commonwealth University Richmond, VA, USA

  • Venue:
  • Proceedings of Workshop on General Purpose Processing Using GPUs
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

General-purpose graphics processing units (GPGPU) brings an opportunity to improve the performance for many applications. However, exploiting parallelism is low productive in current programming frameworks such as CUDA and OpenCL. Programmers have to consider and deal with many GPGPU architecture details; therefore it is a challenge to trade off the programmability and the efficiency of performance tuning. Parallel Repacking (PR) is a popular performance tuning approach for GPGPU applications, which improves the performance by changing the parallel granularity. Existing code transformation algorithms using PR increase the productivity, but they do not cover adequate code patterns and do not give an effective code error detection. In this paper, we propose a novel parallel repacking algorithm (APR) to cover a wide range of code patterns and improve efficiency. We develop an efficient code model that expresses a GPGPU program as a recursive statement sequence, and introduces a concept of singular statement. APR building upon this model uses appropriate transformation rules for singular and non-singular statements to generate the repacked codes. A recursive transformation is performed when it encounters a branching/loop singular statement. Additionally, singular statements unify the transformation for barriers and data sharing, and enable APR to detect the barrier errors. The experiment results based on a prototype show that out proposed APR covers more code patterns than existing solutions such as the automatic thread coarsening in Crest, and the repacked codes using the APR achieve effective performance gain up to 3.28X speedup, in some cases even higher than manually tuned repacked codes.