Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A Competitive Analysis of Load Balancing Strategiesfor Parallel Ray Tracing
The Journal of Supercomputing
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
A comparison of task pools for dynamic load balancing of irregular algorithms: Research Articles
Concurrency and Computation: Practice & Experience
KD-tree acceleration structures for a GPU raytracer
Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware
Scheduling multithreaded computations by work stealing
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
Work stealing for time-constrained octree exploration: application to real-time 3D modeling
EG PGV'07 Proceedings of the 7th Eurographics conference on Parallel Graphics and Visualization
On sorting and load balancing on GPUs
ACM SIGARCH Computer Architecture News
Efficient implementation of GPGPU synchronization primitives on CPUs
Proceedings of the 7th ACM international conference on Computing frontiers
Task management for irregular-parallel workloads on the GPU
Proceedings of the Conference on High Performance Graphics
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Optimization of N-queens solvers on graphics processors
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Softshell: dynamic scheduling on GPUs
ACM Transactions on Graphics (TOG) - Proceedings of ACM SIGGRAPH Asia 2012
Towards a software transactional memory for graphics processors
EG PGV'10 Proceedings of the 10th Eurographics conference on Parallel Graphics and Visualization
Understanding the performance of concurrent data structures on graphics processors
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A new programming paradigm for GPGPU
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Parallel interval newton method on CUDA
PARA'12 Proceedings of the 11th international conference on Applied Parallel and Scientific Computing
Hi-index | 0.00 |
To get maximum performance on the many-core graphics processors it is important to have an even balance of the workload so that all processing units contribute equally to the task at hand. This can be hard to achieve when the cost of a task is not known beforehand and when new sub-tasks are created dynamically during execution. With the recent advent of scatter operations and atomic hardware primitives it is now possible to bring some of the more elaborate dynamic load balancing schemes from the conventional SMP systems domain to the graphics processor domain. We have compared four different dynamic load balancing methods to see which one is most suited to the highly parallel world of graphics processors. Three of these methods were lock-free and one was lock-based. We evaluated them on the task of creating an octree partitioning of a set of particles. The experiments showed that synchronization can be very expensive and that new methods that take more advantage of the graphics processors features and capabilities might be required. They also showed that lock-free methods achieves better performance than blocking and that they can be made to scale with increased numbers of processing units.