Portable and Transparent Host-Device Communication Optimization for GPGPU Environments

  • Authors:
  • Christos Margiolas;Michael F. P. O'Boyle

  • Affiliations:
  • School of Informatics University of Edinburgh;School of Informatics University of Edinburgh

  • Venue:
  • Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

General purpose graphics processors units (GPU) provide the potential for high computational performance with reduced cost and power. Typically they are employed in heterogeneous settings acting as accelerators. Here an application resides on a host multi-core, dispatching work to the GPU. However, workload dispatch is frequently accompanied by large scale data transfers between the host main memory and the dedicated memories of the GPUs. For many applications, memory allocation and communication overhead can severely reduce the benefits of GPU acceleration. This paper develops an approach that reduces host-device communication overhead for OpenCL applications. It does this without modification or recompilation of the application source code and is portable across platforms. It achieves this by tracing and analyzing calls to the runtime made by the application and then selecting the best platform specific memory allocation and communication policy. This approach was applied to 12 existing OpenCL benchmarks from Parboil and Rodinia suites on 3 different platforms where it gives on average a speedup of 1.51, 1.31 and 1.48, respectively. In certain cases, our approach leads up to a factor of three times improvement over current approaches.