Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
On-the-fly maintenance of series-parallel relationships in fork-join multithreaded programs
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Achieving high performance on extremely large parallel machines: performance prediction and load balancing
X10: concurrent programming for modern architectures
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Nested parallelism in transactional memory
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Intel threading building blocks
Intel threading building blocks
IEEE Transactions on Parallel and Distributed Systems
Work-first and help-first scheduling policies for async-finish task parallelism
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Scalable and precise dynamic datarace detection for structured parallelism
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Work stealing and persistence-based load balancers for iterative overdecomposed applications
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
pGraph: Efficient Parallel Construction of Large-Scale Protein Sequence Homology Graphs
IEEE Transactions on Parallel and Distributed Systems
Hi-index | 0.00 |
Work stealing is a popular approach to scheduling task-parallel programs. The flexibility inherent in work stealing when dealing with load imbalance results in seemingly irregular computation structures, complicating the study of its runtime behavior. In this paper, we present an approach to efficiently trace async-finish parallel programs scheduled using work stealing. We identify key properties that allow us to trace the execution of tasks with low time and space overheads. We also study the usefulness of the proposed schemes in supporting algorithms for data-race detection and retentive stealing presented in the literature. We demonstrate that the perturbation due to tracing is within the variation in the execution time with 99% confidence and the traces are concise, amounting to a few tens of kilobytes per thread in most cases. We also demonstrate that the traces enable significant reductions in the cost of detecting data races and result in low, stable space overheads in supporting retentive stealing for async-finish programs.