Real-time robot motion planning using rasterizing computer graphics hardware
SIGGRAPH '90 Proceedings of the 17th annual conference on Computer graphics and interactive techniques
Accelerated volume rendering and tomographic reconstruction using texture mapping hardware
VVS '94 Proceedings of the 1994 symposium on Volume visualization
A real-time procedural shading system for programmable graphics hardware
Proceedings of the 28th annual conference on Computer graphics and interactive techniques
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Cg: a system for programming graphics hardware in a C-like language
ACM SIGGRAPH 2003 Papers
OpenGL(R) Shading Language
Brook for GPUs: stream computing on graphics hardware
ACM SIGGRAPH 2004 Papers
ACM SIGGRAPH 2004 Papers
Glift: Generic, efficient, random-access GPU data structures
ACM Transactions on Graphics (TOG)
Accelerator: using data parallelism to program GPUs for general-purpose uses
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
ACM SIGGRAPH 2007 courses
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Scalable parallel programming with CUDA
ACM SIGGRAPH 2008 classes
Patterns for parallel programming
Patterns for parallel programming
CUDASA: compute unified device and systems architecture
EG PGV'08 Proceedings of the 8th Eurographics conference on Parallel Graphics and Visualization
Hi-index | 0.00 |
In this paper, we present a multi-level programming model for recent GPU-based high performance computing systems. Involving cooperative stream threads and symmetric multiprocessing threads our model gives a computational framework that scales through multi-GPU environments to GPU-cluster systems. Instead of hiding the execution environment from the programmer using compiler extensions or metaprogramming techniques we aim a solution that both enables optimizations and provides abstract problem space mapping with code reusability and virtualization of hardware resources in order to decrease the programming effort. We evaluate an implementation of our model based on CUDA, OpenMP, and MPI2 technologies on a complex practical application scenario and discuss its performance scaling behavior.