Steal Tree: low-overhead tracing of work stealing schedulers

Authors:
Jonathan Lifflander;Sriram Krishnamoorthy;Laxmikant V. Kale
Affiliations:
Univeristy of Illinois Urbana-Champaign, Urbana, Illinois, USA;Pacific Northwest National Lab, Richland, Washington, USA;Univeristy of Illinois Urbana-Champaign, Urbana, Illinois, USA
Venue:
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Year:
2013

Citing 13
Cited 0

Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Executing multithreaded programs efficiently

Executing multithreaded programs efficiently
On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Achieving high performance on extremely large parallel machines: performance prediction and load balancing

Achieving high performance on extremely large parallel machines: performance prediction and load balancing
X10: concurrent programming for modern architectures

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Nested parallelism in transactional memory

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Intel threading building blocks

Intel threading building blocks
The Design of OpenMP Tasks

IEEE Transactions on Parallel and Distributed Systems
Work-first and help-first scheduling policies for async-finish task parallelism

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Identifying Performance Bottlenecks in Work-Stealing Computations

Computer
Scalable and precise dynamic datarace detection for structured parallelism

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Work stealing and persistence-based load balancers for iterative overdecomposed applications

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs

IEEE Transactions on Parallel and Distributed Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Work stealing is a popular approach to scheduling task-parallel programs. The flexibility inherent in work stealing when dealing with load imbalance results in seemingly irregular computation structures, complicating the study of its runtime behavior. In this paper, we present an approach to efficiently trace async-finish parallel programs scheduled using work stealing. We identify key properties that allow us to trace the execution of tasks with low time and space overheads. We also study the usefulness of the proposed schemes in supporting algorithms for data-race detection and retentive stealing presented in the literature. We demonstrate that the perturbation due to tracing is within the variation in the execution time with 99% confidence and the traces are concise, amounting to a few tens of kilobytes per thread in most cases. We also demonstrate that the traces enable significant reductions in the cost of detecting data races and result in low, stable space overheads in supporting retentive stealing for async-finish programs.