The Reyes image rendering architecture
SIGGRAPH '87 Proceedings of the 14th annual conference on Computer graphics and interactive techniques
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A Competitive Analysis of Load Balancing Strategiesfor Parallel Ray Tracing
The Journal of Supercomputing
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
Larrabee: a many-core x86 architecture for visual computing
ACM SIGGRAPH 2008 papers
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Using OpenMP: Portable Shared Memory Parallel Programming (Scientific and Engineering Computation)
Real-time Reyes-style adaptive surface subdivision
ACM SIGGRAPH Asia 2008 papers
On dynamic load balancing on graphics processors
Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
GRAMPS: A programming model for graphics pipelines
ACM Transactions on Graphics (TOG)
Data-parallel rasterization of micropolygons with defocus and motion blur
Proceedings of the Conference on High Performance Graphics 2009
Parallel view-dependent tessellation of Catmull-Clark subdivision surfaces
Proceedings of the Conference on High Performance Graphics 2009
Understanding the efficiency of ray traversal on GPUs
Proceedings of the Conference on High Performance Graphics 2009
DiagSplit: parallel, crack-free, adaptive tessellation for micropolygon rendering
ACM SIGGRAPH Asia 2009 papers
RenderAnts: interactive Reyes rendering on GPUs
ACM SIGGRAPH Asia 2009 papers
FreePipe: a programmable parallel rendering architecture for efficient multi-fragment effects
Proceedings of the 2010 ACM SIGGRAPH symposium on Interactive 3D Graphics and Games
The Art of Multiprocessor Programming
The Art of Multiprocessor Programming
OptiX: a general purpose ray tracing engine
ACM SIGGRAPH 2010 papers
Processing data streams with hard real-time constraints on heterogeneous systems
Proceedings of the international conference on Supercomputing
Optimization of N-queens solvers on graphics processors
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Efficient pixel-accurate rendering of curved surfaces
I3D '12 Proceedings of the ACM SIGGRAPH Symposium on Interactive 3D Graphics and Games
Softshell: dynamic scheduling on GPUs
ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
Scheduling processing of real-time data streams on heterogeneous multi-GPU systems
Proceedings of the 5th Annual International Systems and Storage Conference
A new programming paradigm for GPGPU
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Parallel interval newton method on CUDA
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
DANBI: dynamic scheduling of irregular stream programs for many-core systems
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
We explore software mechanisms for managing irregular tasks on graphics processing units (GPUs). We demonstrate that dynamic scheduling and efficient memory management are critical problems in achieving high efficiency on irregular workloads. We experiment with several task-management techniques, ranging from the use of a single monolithic task queue to distributed queuing with task stealing and donation. On irregular workloads, we show that both centralized and distributed queues have more than 100 times as much idle times as our task-stealing and -donation queues. Our preferred choice is task-donation because of comparable performance to task-stealing while using less memory overhead. To help in this analysis, we use an artificial task-management system that monitors performance and memory usage to quantify the impact of these different techniques. We validate our results by implementing a Reyes renderer with its irregular split-and-dice workload that is able to achieve real-time framerates on a single GPU.