The program structure tree: computing control regions in linear time
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Advanced compiler design and implementation
Advanced compiler design and implementation
Using Hammock Graphs to Structure Programs
IEEE Transactions on Software Engineering
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
MCUDA: An Efficient Implementation of CUDA Kernels for Multi-core CPUs
Languages and Compilers for Parallel Computing
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
A framework for dynamically instrumenting GPU compute applications within GPU Ocelot
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
A framework for dynamically instrumenting GPU compute applications within GPU Ocelot
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
International Journal of High Performance Computing Applications
Proceedings of Workshop on General Purpose Processing Using GPUs
Hi-index | 0.00 |
Graphics Processing Units (GPU) have become the platform of choice for accelerating a large range of data parallel and task parallel applications. Both AMD and NVIDIA have developed GPU implementations targeted at the high performance computing market. The rapid adoption of GPU computing has been greatly aided by the introduction of high-level programming environments such as NVIDIA's CUDA C and Khronos' OpenCL. Given the fact that CUDA C has been on the market for a number of years, a large number of applications have been developed in the HPC community. In this paper we describe Caracal, our implementation of a dynamic translation framework that allows CUDA C programs to run on alternative GPU platforms. Here we target the AMD Evergreen family of GPUs. We discuss the challenges of compatibility and correctness faced by the translator using specific examples. We analyze the overhead of the translator compared with the execution time of several benchmarks. We also compare the quality of the code generated by our framework with that produced by the AMD OpenCL library. Our dynamically translated code performs comparably to the native OpenCL library, expands the opportunities for running CUDA C on new heterogeneous architectures, and provides a vehicle for evaluating compiler optimizations in the future.