KD-tree acceleration structures for a GPU raytracer
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Copy or Discard execution model for speculative parallelization on multicores
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Speculative parallelization of sequential loops on multicores
International Journal of Parallel Programming
Real-time high-dynamic range texture compression based on local fractal transform
Proceedings of the 24th Spring Conference on Computer Graphics
Enhanced speculative parallelization via incremental recovery
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Fast parallel unbiased diffeomorphic atlas construction on multi-graphics processing units
EG PGV'09 Proceedings of the 9th Eurographics conference on Parallel Graphics and Visualization
Exploiting Task- and Data-Level Parallelism in Streaming Applications Implemented in FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Hi-index | 0.00 |
The raw compute performance of today's graphics processor is truly amazing. With peak performance of over 60 GFLOPS, the compute power of today's graphics processor (GPU) dwarfs that of the commodity CPU at a price of only a few hundred dollars. As the programmability and performance of modern graphics hardware continues to increase, many researchers are looking to graphics hardware to solve computationally intensive problems previously performed on general purpose CPUs. The challenge, however, is how to re-target these processors from game rendering to general computation, such as numerical modeling, scientific computing, or signal processing. Traditional graphics APIs abstract the GPU as a rendering device, involving textures, triangles, and pixels. Mapping an algorithm to use these primitives is not a straightforward operation, even for the most advanced graphics developers. In this dissertation, we explore the concept of stream computing with GPUs. We describe the stream processor abstraction and how this abstraction and corresponding programming model can efficiently represent computation on the GPU. To formalize the model, we present Brook for GPUs, a programming system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor. We present a compiler and runtime system that abstracts and virtualizes many aspects of graphics hardware. In addition, we present an analysis of the effectiveness of the GPU as a streaming processor and evaluate the performance of a collection of benchmark applications in comparison to their CPU implementations. For a variety of the applications explored in this dissertation, we demonstrate that our Brook implementations performs up to seven times faster than their CPU counterparts. We also discuss some of the algorithmic decisions which are critical for efficient execution when using the stream programming model for the GPU.