Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Program optimization space pruning for a multithreaded gpu
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Implementing the PGI Accelerator model
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Accelerating SQL database operations on a GPU with CUDA
Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units
Extending abstract GPU APIs to shared memory
Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
memCUDA: map device memory to host memory on GPGPU platform
NPC'10 Proceedings of the 2010 IFIP international conference on Network and parallel computing
OpenMPC: Extended OpenMP Programming and Tuning for GPUs
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Breaking the GPU programming barrier with the auto-parallelising SAC compiler
Proceedings of the sixth workshop on Declarative aspects of multicore programming
Copperhead: compiling an embedded data parallel language
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Unified parallel C for GPU clusters: language extensions and compiler implementation
LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
A platform-independent tool for modeling parallel programs
Proceedings of the 49th Annual Southeast Regional Conference
IWOMP'11 Proceedings of the 7th international conference on OpenMP in the Petascale era
PTask: operating system abstractions to manage GPUs as compute devices
SOSP '11 Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
Chestnut: a GPU programming language for non-experts
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Dynamically managed data for CPU-GPU architectures
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Experiences with high-level programming directives for porting applications to GPUs
Facing the Multicore-Challenge II
A compiler-assisted runtime-prefetching scheme for heterogeneous platforms
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
GPUstore: harnessing GPU computing for storage systems in the OS kernel
Proceedings of the 5th Annual International Systems and Storage Conference
A script-based autotuning compiler system to generate high-performance CUDA code
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
GPUfs: integrating a file system with GPUs
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Input-aware auto-tuning for directive-based GPU programming
Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
ACM SIGOPS 24th Symposium on Operating Systems Principles
Dandelion: a compiler and runtime for heterogeneous systems
Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles
GPUfs: Integrating a file system with GPUs
ACM Transactions on Computer Systems (TOCS)
On Expressing Strategies for Directive-Driven Multicore Programing Models
Proceedings of Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures and Design Tools and Architectures for Multicore Embedded Computing Platforms
Experiences Developing the OpenUH Compiler and Runtime Infrastructure
International Journal of Parallel Programming
Efficient implementation of data flow graphs on multi-gpu clusters
Journal of Real-Time Image Processing
Hi-index | 0.00 |
The Compute Unified Device Architecture (CUDA) has become a de facto standard for programming NVIDIA GPUs. However, CUDA places on the programmer the burden of packaging GPU code in separate functions, of explicitly managing data transfer between the host memory and various components of the GPU memory, and of manually optimizing the utilization of the GPU memory. Practical experience shows that the programmer needs to make significant code changes, which are often tedious and error-prone, before getting an optimized program. We have designed hiCUDA, a high-level directive-based language for CUDA programming. It allows programmers to perform these tedious tasks in a simpler manner, and directly to the sequential code. Nonetheless, it supports the same programming paradigm already familiar to CUDA programmers. We have prototyped a source-to-source compiler that translates a hiCUDA program to a CUDA program. Experiments using five standard CUDA bechmarks show that the simplicity and flexibility hiCUDA provides come at no expense to performance.