Caracal: dynamic translation of runtime environments for GPUs

Authors:
Rodrigo Domínguez;Dana Schaa;David Kaeli
Affiliations:
Northeastern University, Boston, MA;Northeastern University, Boston, MA;Northeastern University, Boston, MA
Venue:
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Year:
2011

Citing 8
Cited 4

The program structure tree: computing control regions in linear time

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Advanced compiler design and implementation

Advanced compiler design and implementation
Using Hammock Graphs to Structure Programs

IEEE Transactions on Software Engineering
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs

Languages and Compilers for Parallel Computing
Twin peaks: a software platform for heterogeneous computing on general-purpose and graphics processors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A framework for dynamically instrumenting GPU compute applications within GPU Ocelot

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units

A framework for dynamically instrumenting GPU compute applications within GPU Ocelot

Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
Characterization and transformation of unstructured control flow in bulk synchronous GPU applications

International Journal of High Performance Computing Applications
Characterizing the challenges and evaluating the efficacy of a CUDA-to-OpenCL translator

Parallel Computing
Efficient Instrumentation of GPGPU Applications Using Information Flow Analysis and Symbolic Execution

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.00

Visualization

Abstract

Graphics Processing Units (GPU) have become the platform of choice for accelerating a large range of data parallel and task parallel applications. Both AMD and NVIDIA have developed GPU implementations targeted at the high performance computing market. The rapid adoption of GPU computing has been greatly aided by the introduction of high-level programming environments such as NVIDIA's CUDA C and Khronos' OpenCL. Given the fact that CUDA C has been on the market for a number of years, a large number of applications have been developed in the HPC community. In this paper we describe Caracal, our implementation of a dynamic translation framework that allows CUDA C programs to run on alternative GPU platforms. Here we target the AMD Evergreen family of GPUs. We discuss the challenges of compatibility and correctness faced by the translator using specific examples. We analyze the overhead of the translator compared with the execution time of several benchmarks. We also compare the quality of the code generated by our framework with that produced by the AMD OpenCL library. Our dynamically translated code performs comparably to the native OpenCL library, expands the opportunities for running CUDA C on new heterogeneous architectures, and provides a vehicle for evaluating compiler optimizations in the future.