Non-blocking steal-half work queues

Authors:
Danny Hendler;Nir Shavit
Affiliations:
Tel-Aviv University;Tel-Aviv University
Venue:
Proceedings of the twenty-first annual symposium on Principles of distributed computing
Year:
2002

Citing 9
Cited 16

Wait-free synchronization

ACM Transactions on Programming Languages and Systems (TOPLAS)
A simple load balancing scheme for task allocation in parallel machines

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
A dynamic distributed load balancing algorithm with provable good performance

SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Practical implementations of non-blocking synchronization primitives

PODC '97 Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing
Analyses of load stealing models based on differential equations

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The data locality of work stealing

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
The Power of Two Choices in Randomized Load Balancing

IEEE Transactions on Parallel and Distributed Systems
The Natural Work-Stealing Algorithm is Stable

FOCS '01 Proceedings of the 42nd IEEE symposium on Foundations of Computer Science
Parallel garbage collection for shared memory multiprocessors

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1

Work dealing

Proceedings of the fourteenth annual ACM symposium on Parallel algorithms and architectures
Dynamic circular work-stealing deque

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
A dynamic-sized nonblocking work stealing deque

Distributed Computing - Special issue: DISC 04
Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Adaptive work-stealing with parallelism feedback

ACM Transactions on Computer Systems (TOCS)
Fine-Grained Task Scheduling Using Adaptive Data Structures

Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Idempotent work stealing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A dynamic-sized nonblocking work stealing deque

A dynamic-sized nonblocking work stealing deque
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
A new programming paradigm for GPGPU

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Scheduling parallel programs by work stealing with private deques

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hardware support for fine-grained event-driven computation in Anton 2

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Friendly barriers: efficient work-stealing with return barriers

Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

The non-blocking work-stealing algorithm of Arora et al. has been gaining popularity as the multiprocessor load balancing technology of choice in both Industry and Academia. At its core is an ingenious scheme for stealing a single item in a non-blocking manner from an array based deque. In recent years, several researchers have argued that stealing more than a single item at a time allows for increased stability, greater overall balance, and improved performance.This paper presents StealHalf, a new generalization of the Arora et al. algorithm, that allows processes, instead of stealing one, to steal up to half of the items in a given queue at a time. The new algorithm preserves the key properties of the Arora et al. algorithm: it is non-blocking, and it minimizes the number of CAS operations that the local process needs to perform. We provide analysis that proves that the new algorithm provides better load distribution: the expected load of any process throughout the execution is less than a constant away from the overall system average.