Work-first and help-first scheduling policies for async-finish task parallelism

Authors:
Yi Guo;Rajkishore Barik;Raghavan Raman;Vivek Sarkar
Affiliations:
Department of Computer Science, Rice University, USA;Department of Computer Science, Rice University, USA;Department of Computer Science, Rice University, USA;Department of Computer Science, Rice University, USA
Venue:
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Year:
2009

Citing 0
Cited 25

The habanero multicore software research project

Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications
PFunc: modern task parallelism for modern high performance computing

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Flexible architectural support for fine-grain scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Comparing the usability of library vs. language approaches to task parallelism

Evaluation and Usability of Programming Languages and Tools
Lifeline-based global load balancing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Experiments with the Fresh Breeze tree-based memory model

Computer Science - Research and Development
Hardware and software tradeoffs for task synchronization on manycore architectures

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Intermediate language extensions for parallelism

Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Massively parallel breadth first search using a tree-structured memory model

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
A work-stealing scheduler for X10's task parallelism with suspension

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A performance model for X10 applications: what's going on under the hood?

Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Mapping a data-flow programming model onto heterogeneous platforms

Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Scalable and precise dynamic datarace detection for structured parallelism

Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures

Proceedings of the 26th ACM international conference on Supercomputing
Design, verification and applications of a new read-write lock algorithm

Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Towards a practical secure concurrent language

Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Efficient data race detection for async-finish parallelism

Formal Methods in System Design
Work-stealing with configurable scheduling strategies

Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
A Transformation Framework for Optimizing Task-Parallel Programs

ACM Transactions on Programming Languages and Systems (TOPLAS)
Steal Tree: low-overhead tracing of work stealing schedulers

Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Accelerating Habanero-Java programs with OpenCL generation

Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools
GLB: lifeline-based global load balancing library in x10

Proceedings of the first workshop on Parallel programming for analytics applications
Friendly barriers: efficient work-stealing with return barriers

Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multiple programming models are emerging to address an increased need for dynamic task parallelism in applications for multicore processors and shared-address-space parallel computing. Examples include OpenMP 3.0, Java Concurrency Utilities, Microsoft Task Parallel Library, Intel Thread Building Blocks, Cilk, X10, Chapel, and Fortress. Scheduling algorithms based on work stealing, as embodied in Cilk's implementation of dynamic spawn-sync parallelism, are gaining in popularity but also have inherent limitations. In this paper, we address the problem of efficient and scalable implementation of X10's async-finish task parallelism, which is more general than Cilk's spawn-sync parallelism. We introduce a new work-stealing scheduler with compiler support for async-finish task parallelism that can accommodate both work-first and help-first scheduling policies. Performance results on two different multicore SMP platforms show significant improvements due to our new work-stealing algorithm compared to the existing work-sharing scheduler for X10, and also provide insights on scenarios in which the help-first policy yields better results than the work-first policy and vice versa.