The habanero multicore software research project
Proceedings of the 24th ACM SIGPLAN conference companion on Object oriented programming systems languages and applications
PFunc: modern task parallelism for modern high performance computing
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Comparing the usability of library vs. language approaches to task parallelism
Evaluation and Usability of Programming Languages and Tools
Lifeline-based global load balancing
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Experiments with the Fresh Breeze tree-based memory model
Computer Science - Research and Development
Hardware and software tradeoffs for task synchronization on manycore architectures
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Intermediate language extensions for parallelism
Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11
Massively parallel breadth first search using a tree-structured memory model
Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
A work-stealing scheduler for X10's task parallelism with suspension
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A performance model for X10 applications: what's going on under the hood?
Proceedings of the 2011 ACM SIGPLAN X10 Workshop
Mapping a data-flow programming model onto heterogeneous platforms
Proceedings of the 13th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, Tools and Theory for Embedded Systems
Scalable and precise dynamic datarace detection for structured parallelism
Proceedings of the 33rd ACM SIGPLAN conference on Programming Language Design and Implementation
CATS: cache aware task-stealing based on online profiling in multi-socket multi-core architectures
Proceedings of the 26th ACM international conference on Supercomputing
Design, verification and applications of a new read-write lock algorithm
Proceedings of the twenty-fourth annual ACM symposium on Parallelism in algorithms and architectures
Towards a practical secure concurrent language
Proceedings of the ACM international conference on Object oriented programming systems languages and applications
Efficient data race detection for async-finish parallelism
Formal Methods in System Design
Work-stealing with configurable scheduling strategies
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
A Transformation Framework for Optimizing Task-Parallel Programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Steal Tree: low-overhead tracing of work stealing schedulers
Proceedings of the 34th ACM SIGPLAN conference on Programming language design and implementation
Accelerating Habanero-Java programs with OpenCL generation
Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools
GLB: lifeline-based global load balancing library in x10
Proceedings of the first workshop on Parallel programming for analytics applications
Friendly barriers: efficient work-stealing with return barriers
Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Adaptive workload-aware task scheduling for single-ISA asymmetric multicore architectures
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Multiple programming models are emerging to address an increased need for dynamic task parallelism in applications for multicore processors and shared-address-space parallel computing. Examples include OpenMP 3.0, Java Concurrency Utilities, Microsoft Task Parallel Library, Intel Thread Building Blocks, Cilk, X10, Chapel, and Fortress. Scheduling algorithms based on work stealing, as embodied in Cilk's implementation of dynamic spawn-sync parallelism, are gaining in popularity but also have inherent limitations. In this paper, we address the problem of efficient and scalable implementation of X10's async-finish task parallelism, which is more general than Cilk's spawn-sync parallelism. We introduce a new work-stealing scheduler with compiler support for async-finish task parallelism that can accommodate both work-first and help-first scheduling policies. Performance results on two different multicore SMP platforms show significant improvements due to our new work-stealing algorithm compared to the existing work-sharing scheduler for X10, and also provide insights on scenarios in which the help-first policy yields better results than the work-first policy and vice versa.