A dynamic-sized nonblocking work stealing deque

Authors:
Danny Hendler;Yossi Lev;Mark Moir;Nir Shavit
Affiliations:
Tel-Aviv University;Brown University & Sun Microsystems Laboratories;Sun Microsystems Laboratories;Sun Microsystems Laboratories & Tel-Aviv University
Venue:
Distributed Computing - Special issue: DISC 04
Year:
2006

Citing 13
Cited 10

A simple load balancing scheme for task allocation in parallel machines

SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Splash 2

SPAA '92 Proceedings of the fourth annual ACM symposium on Parallel algorithms and architectures
The synergy between non-blocking synchronization and operating system structure

OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
The performance of work stealing in multiprogrammed environments (extended abstract)

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
The data locality of work stealing

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Non-blocking steal-half work queues

Proceedings of the twenty-first annual symposium on Principles of distributed computing
The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information

The Art of Computer Programming, 2nd Ed. (Addison-Wesley Series in Computer Science and Information
A Nonblocking Algorithm for Shared Queues Using Compare-and-Swap

IEEE Transactions on Computers
Garbage-first garbage collection

Proceedings of the 4th international symposium on Memory management
Parallel garbage collection for shared memory multiprocessors

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1
DCAS-based concurrent deques supporting bulk allocation

DCAS-based concurrent deques supporting bulk allocation
A dynamic-sized nonblocking work stealing deque

A dynamic-sized nonblocking work stealing deque

Adaptive work-stealing with parallelism feedback

ACM Transactions on Computer Systems (TOCS)
Idempotent work stealing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A lock-free algorithm for concurrent bags

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
OpenMP task scheduling strategies for multicore NUMA systems

International Journal of High Performance Computing Applications
SALSA: scalable and low synchronization NUMA-aware algorithm for producer-consumer pools

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Server-based scheduling of parallel real-time tasks

Proceedings of the tenth ACM international conference on Embedded software
A new programming paradigm for GPGPU

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Scheduling parallel programs by work stealing with private deques

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

The non-blocking work-stealing algorithm of Arora, Blumofe, and Plaxton (hencheforth ABP work-stealing) is on its way to becoming the multiprocessor load balancing technology of choice in both industry and academia. This highly efficient scheme is based on a collection of array-based double-ended queues (deques) with low cost synchronization among local and stealing processes. Unfortunately, the algorithm's synchronization protocol is strongly based on the use of fixed size arrays, which are prone to overflows, especially in the multiprogrammed environments for which they are designed. This is a significant drawback since, apart from memory inefficiency, it means that the size of the deque must be tailored to accommodate the effects of the hard-to-predict level of multiprogramming, and the implementation must include an expensive and application-specific overflow mechanism.This paper presents the first dynamic memory work-stealing algorithm. It is based on a novel way of building non-blocking dynamic-sized work stealing deques by detecting synchronization conflicts based on "pointer-crossing" rather than "gaps between indexes" as in the original ABP algorithm. As we show, the new algorithm dramatically increases robustness and memory efficiency, while causing applications no observable performance penalty. We therefore believe it can replace array-based ABP work stealing deques, eliminating the need for application-specific overflow mechanisms.