ACM Transactions on Programming Languages and Systems (TOPLAS)
A simple load balancing scheme for task allocation in parallel machines
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
A dynamic distributed load balancing algorithm with provable good performance
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Practical implementations of non-blocking synchronization primitives
PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Analyses of load stealing models based on differential equations
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The data locality of work stealing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
The Power of Two Choices in Randomized Load Balancing
IEEE Transactions on Parallel and Distributed Systems
The Natural Work-Stealing Algorithm is Stable
FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Parallel garbage collection for shared memory multiprocessors
JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Dynamic circular work-stealing deque
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
A dynamic-sized nonblocking work stealing deque
Distributed Computing - Special issue: DISC 04
Carbon: architectural support for fine-grained parallelism on chip multiprocessors
Proceedings of the 34th annual international symposium on Computer architecture
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS)
Fine-Grained Task Scheduling Using Adaptive Data Structures
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A dynamic-sized nonblocking work stealing deque
A dynamic-sized nonblocking work stealing deque
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated
Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
A new programming paradigm for GPGPU
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Scheduling parallel programs by work stealing with private deques
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hardware support for fine-grained event-driven computation in Anton 2
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Friendly barriers: efficient work-stealing with return barriers
Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Hi-index | 0.00 |
The non-blocking work-stealing algorithm of Arora et al. has been gaining popularity as the multiprocessor load balancing technology of choice in both Industry and Academia. At its core is an ingenious scheme for stealing a single item in a non-blocking manner from an array based deque. In recent years, several researchers have argued that stealing more than a single item at a time allows for increased stability, greater overall balance, and improved performance.This paper presents StealHalf, a new generalization of the Arora et al. algorithm, that allows processes, instead of stealing one, to steal up to half of the items in a given queue at a time. The new algorithm preserves the key properties of the Arora et al. algorithm: it is non-blocking, and it minimizes the number of CAS operations that the local process needs to perform. We provide analysis that proves that the new algorithm provides better load distribution: the expected load of any process throughout the execution is less than a constant away from the overall system average.