Real-time garbage collection on general-purpose machines
Journal of Systems and Software
Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
A simple load balancing scheme for task allocation in parallel machines
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
Debugging optimized code with dynamic deoptimization
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
A dynamic distributed load balancing algorithm with provable good performance
SPAA '93 Proceedings of the fifth annual ACM symposium on Parallel algorithms and architectures
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Analyses of load stealing models based on differential equations
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
Proceedings of the ACM 2000 conference on Java Grande
Adaptive optimization in the Jalapeño JVM
OOPSLA '00 Proceedings of the 15th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Non-blocking steal-half work queues
Proceedings of the twenty-first annual symposium on Principles of distributed computing
Design, implementation and evaluation of adaptive recompilation with on-stack replacement
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The Natural Work-Stealing Algorithm is Stable
SIAM Journal on Computing
IBM Systems Journal
Real-Time GC in JeRTy"VM Using the Return-Barrier Method
ISORC '05 Proceedings of the Eighth IEEE International Symposium on Object-Oriented Real-Time Distributed Computing
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Experiences with Multi-threading and Dynamic Class Loading in a Java Just-In-Time Compiler
Proceedings of the International Symposium on Code Generation and Optimization
Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Intel threading building blocks
Intel threading building blocks
A lock-free, concurrent, and incremental stack scanning for garbage collectors
Proceedings of the 2009 ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Work-first and help-first scheduling policies for async-finish task parallelism
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
UTS: an unbalanced tree search benchmark
LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Habanero-Java: the new adventures of old X10
Proceedings of the 9th International Conference on Principles and Practice of Programming in Java
Practical permissions for race-free parallelism
ECOOP'12 Proceedings of the 26th European conference on Object-Oriented Programming
Work-stealing without the baggage
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Scheduling parallel programs by work stealing with private deques
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Isolation for nested task parallelism
Proceedings of the 2013 ACM SIGPLAN international conference on Object oriented programming systems languages & applications
Hi-index | 0.00 |
This paper addresses the problem of efficiently supporting parallelism within a managed runtime. A popular approach for exploiting software parallelism on parallel hardware is task parallelism, where the programmer explicitly identifies potential parallelism and the runtime then schedules the work. Work-stealing is a promising scheduling strategy that a runtime may use to keep otherwise idle hardware busy while relieving overloaded hardware of its burden. However, work-stealing comes with substantial overheads. Recent work identified sequential overheads of work-stealing, those that occur even when no stealing takes place, as a significant source of overhead. That work was able to reduce sequential overheads to just 15%. In this work, we turn to dynamic overheads, those that occur each time a steal takes place. We show that the dynamic overhead is dominated by introspection of the victim's stack when a steal takes place. We exploit the idea of a low overhead return barrier to reduce the dynamic overhead by approximately half, resulting in total performance improvements of as much as 20%. Because, unlike prior work, we attack the overheads directly due to stealing and therefore attack the overheads that grow as parallelism grows, we improve the scalability of work-stealing applications. This result is complementary to recent work addressing the sequential overheads of work-stealing. This work therefore substantially relieves work-stealing of the increasing pressure due to increasing intra-node hardware parallelism.