Task management for irregular-parallel workloads on the GPU

Authors:
Stanley Tzeng;Anjul Patney;John D. Owens
Affiliations:
University of California, Davis;University of California, Davis;University of California, Davis
Venue:
Proceedings of the Conference on High Performance Graphics
Year:
2010

Citing 17
Cited 8

The Reyes image rendering architecture

SIGGRAPH '87 Proceedings of the 14th annual conference on Computer graphics and interactive techniques
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A Competitive Analysis of Load Balancing Strategiesfor Parallel Ray Tracing

The Journal of Supercomputing
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
Larrabee: a many-core x86 architecture for visual computing

ACM SIGGRAPH 2008 papers
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)

Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Real-time Reyes-style adaptive surface subdivision

ACM SIGGRAPH Asia 2008 papers
On dynamic load balancing on graphics processors

Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
GRAMPS: A programming model for graphics pipelines

ACM Transactions on Graphics (TOG)
Data-parallel rasterization of micropolygons with defocus and motion blur

Proceedings of the Conference on High Performance Graphics 2009
Parallel view-dependent tessellation of Catmull-Clark subdivision surfaces

Proceedings of the Conference on High Performance Graphics 2009
Understanding the efficiency of ray traversal on GPUs

Proceedings of the Conference on High Performance Graphics 2009
DiagSplit: parallel, crack-free, adaptive tessellation for micropolygon rendering

ACM SIGGRAPH Asia 2009 papers
RenderAnts: interactive Reyes rendering on GPUs

ACM SIGGRAPH Asia 2009 papers
FreePipe: a programmable parallel rendering architecture for efficient multi-fragment effects

Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games
The Art of Multiprocessor Programming

The Art of Multiprocessor Programming
OptiX: a general purpose ray tracing engine

ACM SIGGRAPH 2010 papers

Processing data streams with hard real-time constraints on heterogeneous systems

Proceedings of the international conference on Supercomputing
Optimization of N-queens solvers on graphics processors

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Efficient pixel-accurate rendering of curved surfaces

I3D '12 Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games
Softshell: dynamic scheduling on GPUs

ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
Scheduling processing of real-time data streams on heterogeneous multi-GPU systems

Proceedings of the 5th Annual International Systems and Storage Conference
A new programming paradigm for GPGPU

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Parallel interval newton method on CUDA

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
DANBI: dynamic scheduling of irregular stream programs for many-core systems

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

We explore software mechanisms for managing irregular tasks on graphics processing units (GPUs). We demonstrate that dynamic scheduling and efficient memory management are critical problems in achieving high efficiency on irregular workloads. We experiment with several task-management techniques, ranging from the use of a single monolithic task queue to distributed queuing with task stealing and donation. On irregular workloads, we show that both centralized and distributed queues have more than 100 times as much idle times as our task-stealing and -donation queues. Our preferred choice is task-donation because of comparable performance to task-stealing while using less memory overhead. To help in this analysis, we use an artificial task-management system that monitors performance and memory usage to quantify the impact of these different techniques. We validate our results by implementing a Reyes renderer with its irregular split-and-dice workload that is able to achieve real-time framerates on a single GPU.