On dynamic load balancing on graphics processors

Authors:
Daniel Cederman;Philippas Tsigas
Affiliations:
Chalmers University of Technology, Göteborg, Sweden;Chalmers University of Technology, Göteborg, Sweden
Venue:
Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Year:
2008

Citing 7
Cited 10

Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A Competitive Analysis of Load Balancing Strategiesfor Parallel Ray Tracing

The Journal of Supercomputing
A simple, fast and scalable non-blocking concurrent FIFO queue for shared memory multiprocessor systems

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
A comparison of task pools for dynamic load balancing of irregular algorithms: Research Articles

Concurrency and Computation: Practice & Experience
KD-tree acceleration structures for a GPU raytracer

Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Work stealing for time-constrained octree exploration: application to real-time 3D modeling

EG PGV'07 Proceedings of the 7th Eurographics conference on Parallel Graphics and Visualization

On sorting and load balancing on GPUs

ACM SIGARCH Computer Architecture News
Efficient implementation of GPGPU synchronization primitives on CPUs

Proceedings of the 7th ACM international conference on Computing frontiers
Task management for irregular-parallel workloads on the GPU

Proceedings of the Conference on High Performance Graphics
Improving SIMT Efficiency of Global Rendering Algorithms with Architectural Support for Dynamic Micro-Kernels

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Optimization of N-queens solvers on graphics processors

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Softshell: dynamic scheduling on GPUs

ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
Towards a software transactional memory for graphics processors

EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
Understanding the performance of concurrent data structures on graphics processors

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A new programming paradigm for GPGPU

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Parallel interval newton method on CUDA

PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

To get maximum performance on the many-core graphics processors it is important to have an even balance of the workload so that all processing units contribute equally to the task at hand. This can be hard to achieve when the cost of a task is not known beforehand and when new sub-tasks are created dynamically during execution. With the recent advent of scatter operations and atomic hardware primitives it is now possible to bring some of the more elaborate dynamic load balancing schemes from the conventional SMP systems domain to the graphics processor domain. We have compared four different dynamic load balancing methods to see which one is most suited to the highly parallel world of graphics processors. Three of these methods were lock-free and one was lock-based. We evaluated them on the task of creating an octree partitioning of a set of particles. The experiments showed that synchronization can be very expensive and that new methods that take more advantage of the graphics processors features and capabilities might be required. They also showed that lock-free methods achieves better performance than blocking and that they can be made to scale with increased numbers of processing units.