Idempotent work stealing

Authors:
Maged M. Michael;Martin T. Vechev;Vijay A. Saraswat
Affiliations:
IBM Thomas J. Watson Research Center, Yorktown Height, NY, USA;IBM Thomas J. Watson Research Center, Hawthorne, NY, USA;IBM Thomas J. Watson Research Center, Hawthorne, NY, USA
Venue:
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2009

Citing 11
Cited 28

A comparison of parallel algorithms for connected components

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
SIMPLE: a methodology for programming high performance algorithms on clusters of symmetric multiprocessors (SMPs)

Journal of Parallel and Distributed Computing
Non-blocking steal-half work queues

Proceedings of the twenty-first annual symposium on Principles of distributed computing
Dynamic circular work-stealing deque

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
X10: an object-oriented approach to non-uniform cluster computing

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A dynamic-sized nonblocking work stealing deque

Distributed Computing - Special issue: DISC 04
A fast, parallel spanning tree algorithm for symmetric multiprocessors (SMPs)

Journal of Parallel and Distributed Computing
Parallel garbage collection for shared memory multiprocessors

JVM'01 Proceedings of the 2001 Symposium on JavaTM Virtual Machine Research and Technology Symposium - Volume 1

A new approach to parallelising tracing algorithms

Proceedings of the 2009 international symposium on Memory management
Experience with Model Checking Linearizability

Proceedings of the 16th International SPIN Workshop on Model Checking Software
The design of a task parallel library

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Scalable work stealing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
On the verification problem for weak memory models

Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
An adaptive task creation strategy for work-stealing scheduling

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
PHALANX: parallel checking of expressive heap assertions

Proceedings of the 2010 international symposium on Memory management
Simplifying concurrent algorithms by exploiting hardware transactional memory

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Laws of order: expensive synchronization in concurrent algorithms cannot be eliminated

Proceedings of the 38th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Synthesizing concurrent schedulers for irregular algorithms

Proceedings of the sixteenth international conference on Architectural support for programming languages and operating systems
Automatic inference of memory fences

Proceedings of the 2010 Conference on Formal Methods in Computer-Aided Design
Virtual base station pool: towards a wireless network cloud for radio access networks

Proceedings of the 8th ACM International Conference on Computing Frontiers
Incorrect systems: it's not the problem, it's the solution

Proceedings of the 49th Annual Design Automation Conference
Dynamic synthesis for relaxed memory models

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures

Proceedings of the 26th ACM international conference on Supercomputing
LIBKOMP, an efficient openMP runtime system for both fork-join and data flow paradigms

IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
Using load information in work-stealing on distributed systems with non-uniform communication latencies

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Performance, scalability, and semantics of concurrent FIFO queues

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
An architecture for P2P bag-of-tasks execution with multiple task allocation policies in desktop grids

Cluster Computing
Library abstraction for C/C++ concurrency

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Quantitative relaxation of concurrent data structures

POPL '13 Proceedings of the 40th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Correct and efficient work-stealing for weak memory models

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling parallel programs by work stealing with private deques

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Freeze after writing: quasi-deterministic parallel programming with LVars

Proceedings of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages
Energy-efficient work-stealing language runtimes

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Fence-free work stealing on bounded TSO processors

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures

Proceedings of Programming Models and Applications on Multicores and Manycores

Quantified Score

Hi-index	0.00

Visualization

Abstract

Load balancing is a technique which allows efficient parallelization of irregular workloads, and a key component of many applications and parallelizing runtimes. Work-stealing is a popular technique for implementing load balancing, where each parallel thread maintains its own work set of items and occasionally steals items from the sets of other threads. The conventional semantics of work stealing guarantee that each inserted task is eventually extracted exactly once. However, correctness of a wide class of applications allows for relaxed semantics, because either: i) the application already explicitly checks that no work is repeated or ii) the application can tolerate repeated work. In this paper, we introduce idempotent work tealing, and present several new algorithms that exploit the relaxed semantics to deliver better performance. The semantics of the new algorithms guarantee that each inserted task is eventually extracted at least once-instead of exactly once. On mainstream processors, algorithms for conventional work stealing require special atomic instructions or store-load memory ordering fence instructions in the owner's critical path operations. In general, these instructions are substantially slower than regular memory access instructions. By exploiting the relaxed semantics, our algorithms avoid these instructions in the owner's operations. We evaluated our algorithms using common graph problems and micro-benchmarks and compared them to well-known conventional work stealing algorithms, the THE Cilk and Chase-Lev algorithms. We found that our best algorithm (with LIFO extraction) outperforms existing algorithms in nearly all cases, and often by significant margins.