Efficient Mapping of Irregular C++ Applications to Integrated GPUs

Authors:
Rajkishore Barik;Rashid Kaleem;Deepak Majeti;Brian T. Lewis;Tatiana Shpeisman;Chunling Hu;Yang Ni;Ali-Reza Adl-Tabatabai
Affiliations:
Intel Labs Santa Clara, CA;University of Texas Austin, TX;Rice University Houston, TX;Intel Labs Santa Clara, CA;Intel Labs Santa Clara, CA;Intel Labs Santa Clara, CA;Google Inc. Mountain View, CA;Google Inc. Mountain View, CA
Venue:
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Year:
2014

Citing 17
Cited 0

Optimal code motion: theory and practice

ACM Transactions on Programming Languages and Systems (TOPLAS)
OpenMP to GPGPU: a compiler framework for automatic translation and optimization

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Lime: a Java-compatible and synthesizable language for heterogeneous architectures

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Hera-JVM: a runtime system for heterogeneous multi-core architectures

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
hiCUDA: High-Level GPGPU Programming

IEEE Transactions on Parallel and Distributed Systems
Copperhead: compiling an embedded data parallel language

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Accelerating CUDA graph algorithms at maximum warp

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
On-the-fly elimination of dynamic irregularities for GPU computing

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
CnC-CUDA: declarative programming for GPUs

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
Mint: realizing CUDA performance in 3D stencil methods with annotated C

Proceedings of the international conference on Supercomputing
Offload – automating code migration to heterogeneous multicore systems

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Automatic C-to-CUDA code generation for affine programs

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
TAP: A TLP-aware cache management policy for a CPU-GPU heterogeneous architecture

HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Characterization and transformation of unstructured control flow in bulk synchronous GPU applications

International Journal of High Performance Computing Applications
Compiling a high-level language for GPUs: (via language support for architectures and compilers)

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
SIMD parallelization of applications that traverse irregular data structures

CGO '13 Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

There is growing interest in using GPUs to accelerate general-purpose computation since they offer the potential of massive parallelism with reduced energy consumption. This interest has been encouraged by the ubiquity of integrated processors that combine a GPU and CPU on the same die, lowering the cost of offloading work to the GPU. However, while the majority of effort has focused on GPU acceleration of regular applications, relatively little is known about the behavior of irregular applications on GPUs. These applications are expected to perform poorly on GPUs without major software engineering effort. We present a compiler framework with support for C++ features that enables GPU acceleration of a wide range of C++ applications with minimal changes. This framework, Concord, includes a low-cost, software SVM implementation that permits seamless sharing of pointer-containing data structures between the CPU and GPU. It also includes compiler optimizations to improve irregular application performance on GPUs. Using Concord, we ran nine irregular C++ programs on two computer systems containing Intel 4th Generation Core processors. One system is an Ultrabook with an integrated HD Graphics 5000 GPU, and the other system is a desktop with an integrated HD Graphics 4600 GPU. The nine applications are pointer-intensive and operate on irregular data structures such as trees and graphs; they include face detection, BTree, single-source shortest path, soft-body physics simulation, and breadth-first search. Our results show that Concord acceleration using the GPU improves energy efficiency by up to 6.04× on the Ultrabook and 3.52× on the desktop over multicore-CPU execution.