The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Work-first and help-first scheduling policies for async-finish task parallelism
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Hierarchical place trees: a portable abstraction for task parallelism and data movement
LCPC'09 Proceedings of the 22nd international conference on Languages and Compilers for Parallel Computing
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Comparing the usability of library vs. language approaches to task parallelism
Evaluation and Usability of Programming Languages and Tools
Which problems does a multi-language virtual machine need to solve in the multicore/manycore era?
Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Shared work list: hacking amorphous data parallelism in UPC
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Mapping a data-flow programming model onto heterogeneous platforms
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Scalable and precise dynamic datarace detection for structured parallelism
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
Server-based scheduling of parallel real-time tasks
Proceedings of the tenth ACM international conference on Embedded software
Characterizing and mitigating work time inflation in task parallel programs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Prefetching and cache management using task lifetimes
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hybrid parallel task placement in X10
Proceedings of the third ACM SIGPLAN X10 Workshop
Locality-aware task management for unstructured parallelism: a quantitative limit study
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Friendly barriers: efficient work-stealing with return barriers
Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Characterizing and mitigating work time inflation in task parallel programs
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
This poster introduces SLAW, a Scalable Locality-aware Adaptive Work-stealing scheduler. The SLAW features an adaptive task scheduling algorithm combined with a locality-aware scheduling framework. Past work has demonstrated the pros and cons of using fixed scheduling policies, such as work-first and help-first, in different cases without a clear winner. Prior work also assumes the availability and successful execution of a serial version of the parallel program. This assumption can limit the expressiveness of dynamic task parallel languages. The SLAW scheduler supports both work-first and help-first policies simultaneously. It does so by using an adaptive approach that selects a scheduling policy on a per-task basis at runtime. The SLAW scheduler also establishes bounds on the stack usage and the heap space needed to store tasks. The experimental results for the benchmarks studied show that SLAW's adaptive scheduler achieves 0.98x - 9.2$x speedup over the help-first scheduler and 0.97x - 4.5x speedup over the work-first scheduler for 64-thread executions, thereby establishing the robustness of using an adaptive approach instead of a fixed policy. In contrast, the help-first policy is 9.2x slower than work-first in the worst case for a fixed help-first policy, and the work-first policy is 3.7x slower than help-first in the worst case for a fixed work-first policy. Further, for large irregular recursive parallel computations, the adaptive scheduler runs with bounded stack usage and achieves performance (and supports data sizes) that cannot be delivered by the use of any single fixed policy. The SLAW scheduler is designed for programming models where locality hints are provided to the runtime by the programmer or compiler, and achieves locality-awareness by grouping workers into places. Locality awareness can lead to improved performance by increasing temporal data reuse within a worker and among workers in the same place. Our experimental results show that locality-aware scheduling can achieve up to 2.6x speedup over locality-oblivious scheduling, for the benchmarks studied.