hiCUDA: a high-level directive-based language for GPU programming

Authors:
Tianyi David Han;Tarek S. Abdelrahman
Affiliations:
University of Toronto, Toronto, Ontario, Canada;University of Toronto, Toronto, Ontario, Canada
Venue:
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Year:
2009

Citing 4
Cited 26

Shader metaprogramming

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Program optimization space pruning for a multithreaded gpu

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization

Implementing the PGI Accelerator model

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Accelerating SQL database operations on a GPU with CUDA

Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Extending abstract GPU APIs to shared memory

Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
memCUDA: map device memory to host memory on GPGPU platform

NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
OpenMPC: Extended OpenMP Programming and Tuning for GPUs

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Breaking the GPU programming barrier with the auto-parallelising SAC compiler

Proceedings of the sixth workshop on Declarative aspects of multicore programming
Copperhead: compiling an embedded data parallel language

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Unified parallel C for GPU clusters: language extensions and compiler implementation

LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
A platform-independent tool for modeling parallel programs

Proceedings of the 49th Annual Southeast Regional Conference
OpenMP for accelerators

IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
PTask: operating system abstractions to manage GPUs as compute devices

SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
PyCUDA and PyOpenCL: A scripting-based approach to GPU run-time code generation

Parallel Computing
Chestnut: a GPU programming language for non-experts

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Dynamically managed data for CPU-GPU architectures

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Experiences with high-level programming directives for porting applications to GPUs

Facing the Multicore-Challenge II
A compiler-assisted runtime-prefetching scheme for heterogeneous platforms

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
GPUstore: harnessing GPU computing for storage systems in the OS kernel

Proceedings of the 5th Annual International Systems and Storage Conference
A script-based autotuning compiler system to generate high-performance CUDA code

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
GPUfs: integrating a file system with GPUs

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Input-aware auto-tuning for directive-based GPU programming

Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles

ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems

Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
GPUfs: Integrating a file system with GPUs

ACM Transactions on Computer Systems (TOCS)
On Expressing Strategies for Directive-Driven Multicore Programing Models

Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
Experiences Developing the OpenUH Compiler and Runtime Infrastructure

International Journal of Parallel Programming
Efficient implementation of data flow graphs on multi-gpu clusters

Journal of Real-Time Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The Compute Unified Device Architecture (CUDA) has become a de facto standard for programming NVIDIA GPUs. However, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host memory and various components of the GPU memory, and of manually optimizing the utilization of the GPU memory. Practical experience shows that the programmer needs to make significant code changes, which are often tedious and error-prone, before getting an optimized program. We have designed hiCUDA, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner, and directly to the sequential code. Nonetheless, it supports the same programming paradigm already familiar to CUDA programmers. We have prototyped a source-to-source compiler that translates a hiCUDA program to a CUDA program. Experiments using five standard CUDA bechmarks show that the simplicity and flexibility hiCUDA provides come at no expense to performance.