Dynamic circular work-stealing deque

Authors:
David Chase;Yossi Lev
Affiliations:
Sun Microsystems Laboratories, Burlington, MA;Brown University & Sun Microsystems Laboratories, Burlington, MA
Venue:
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Year:
2005

Citing 6
Cited 31

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
The data locality of work stealing

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Non-blocking steal-half work queues

Proceedings of the twenty-first annual symposium on Principles of distributed computing
The Repeat Offender Problem: A Mechanism for Supporting Dynamic-Sized, Lock-Free Data Structures

DISC '02 Proceedings of the 16th International Conference on Distributed Computing
Parallel garbage collection for shared memory multiprocessors

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1

Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Adaptive work-stealing with parallelism feedback

ACM Transactions on Computer Systems (TOCS)
Idempotent work stealing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Runtime support for multicore Haskell

Proceedings of the 14th ACM SIGPLAN international conference on Functional programming
Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
A dynamic-sized nonblocking work stealing deque

A dynamic-sized nonblocking work stealing deque
Flexible architectural support for fine-grain scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
An adaptive task creation strategy for work-stealing scheduling

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Simplifying concurrent algorithms by exploiting hardware transactional memory

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Generic design of Chinese remaindering schemes

Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Granularity-Aware Work-Stealing for Computationally-Uniform Grids

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Dynamic parallelization of recursive code: part 1: managing control flow interactions with the continuator

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Data structures in the multicore age

Communications of the ACM
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Automatic inference of memory fences

Proceedings of the 2010 Conference on Formal Methods in Computer-Aided Design
Obstruction-Free algorithms can be practically wait-free

DISC'05 Proceedings of the 19th international conference on Distributed Computing
OpenMP task scheduling strategies for multicore NUMA systems

International Journal of High Performance Computing Applications
Dynamic synthesis for relaxed memory models

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Automatic inference of memory fences

ACM SIGACT News
Server-based scheduling of parallel real-time tasks

Proceedings of the tenth ACM international conference on Embedded software
OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Speeding up OpenMP tasking

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A new programming paradigm for GPGPU

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Correct and efficient work-stealing for weak memory models

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling parallel programs by work stealing with private deques

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hardware support for fine-grained event-driven computation in Anton 2

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Address-aware fences

Proceedings of the 27th international ACM conference on International conference on supercomputing
Freeze after writing: quasi-deterministic parallel programming with LVars

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Fence-free work stealing on bounded TSO processors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Well-structured futures and cache locality

Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.02

Visualization

Abstract

The non-blocking work-stealing algorithm of Arora, Blumofe, and Plaxton (henceforth ABP work-stealing) is on its way to becoming the multiprocessor load balancing technology of choice in both industry and academia. This highly efficient scheme is based on a collection of array-based double-ended queues (deques) with low cost synchronization among local and stealing processes. Unfortunately, the algorithm's synchronization protocol is strongly based on the use of fixed size arrays, which are prone to overflows, especially in the multiprogrammed environments for which they are designed. We present a work-stealing deque that does not have the overflow problem.The only ABP-style work-stealing algorithm that eliminates the overflow problem is the list-based one presented by Hendler, Lev and Shavit. Their algorithm indeed deals with the overflow problem, but it is complicated, and introduces a trade-off between the space and time complexity, due to the extra work required to maintain the list.Our new algorithm presents a simple lock-free work-stealing deque, which stores the elements in a cyclic array that can grow when it overflows. The algorithm has no limit other than integer overflow (and the system's memory size) on the number of elements that may be on the deque, and the total memory required is linear in the number of elements in the deque.