LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
A characterization and analysis of PTX kernels
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
Rodinia: A benchmark suite for heterogeneous computing
IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
An adaptive performance modeling tool for GPU architectures
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
IISWC '10 Proceedings of the IEEE International Symposium on Workload Characterization (IISWC'10)
Caracal: dynamic translation of runtime environments for GPUs
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
A framework for dynamically instrumenting GPU compute applications within GPU Ocelot
Proceedings of the Fourth Workshop on General Purpose Processing on Graphics Processing Units
A quantitative performance analysis model for GPU architectures
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
GKLEE: concolic verification and test generation for GPUs
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Lynx: A dynamic instrumentation system for data-parallel applications on GPGPU architectures
ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
Portable mapping of data parallel programs to OpenCL for heterogeneous systems
CGO '13 Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)
Hi-index | 0.00 |
Dynamic instrumentation of GPGPU binaries makes possible real-time introspection methods for performance debugging, correctness checks, workload characterization, and runtime optimization. Such instrumentation involves inserting code at the instruction level of an application, while the application is running, thereby able to accurately profile data-dependent application behavior. Runtime overheads seen from instrumentation, however, can obviate its utility. This paper shows how a combination of information flow analysis and symbolic execution can be used to alleviate these overheads. The methods and their effectiveness are demonstrated for a variety of GPGPU codes written in OpenCL that run on AMD GPU target backends. Kernels that can be analyzed entirely via symbolic execution need not be instrumented, thus eliminating kernel runtime overheads altogether. For the remaining GPU kernels, our results show 5-38% improvements in kernel runtime overheads.