Workcrews: an abstraction for controlling parallelism
International Journal of Parallel Programming
New ideas in parallel Lisp: language design, implementations, and programming tools
Proceedings of the US/Japan workshop on Parallel Lisp on Parallel Lisp: languages and systems
Leapfrogging: a portable technique for implementing efficient futures
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Lazy threads: implementing a fast parallel call
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Revised5 report on the algorithmic language scheme
ACM SIGPLAN Notices
StackThreads/MP: integrating futures into calling standards
Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient load balancing for wide-area divide-and-conquer applications
PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Lazy Remote Procedure Call and its Implementation in a Parallel Variant of C
PSLS '95 Proceedings of the International Workshop on Parallel Symbolic Languages and Systems
A Message Passing Implementation of Lazy Task Creation
Proceedings of the US/Japan Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications
Cilk: Efficient Multithreaded Computing
Cilk: Efficient Multithreaded Computing
Indolent Closure Creation
X10: an object-oriented approach to non-uniform cluster computing
OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Intel threading building blocks
Intel threading building blocks
Lightweight lexical closures for legitimate execution stack access
CC'06 Proceedings of the 15th international conference on Compiler Construction
An adaptive task creation strategy for work-stealing scheduling
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Hardware/software support for adaptive work-stealing in on-chip multiprocessor
Journal of Systems Architecture: the EUROMICRO Journal
Granularity-Aware Work-Stealing for Computationally-Uniform Grids
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Oracle scheduling: controlling granularity in implicitly parallel languages
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Scheduling parallel programs by work stealing with private deques
Proceedings of the 18th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
High-productivity languages for parallel computing become more important as parallel environments including multicores become more common. Cilk is such a language. It provides good load balancing for many applications including irregular ones; that is, it keeps all workers busy by creating plenty of "logical" threads and adopting the oldest-first work stealing strategy. This paper proposes a "logical thread"-free framework called Tascell, which achieves a higher performance and supports a wider range of parallel environments including clusters without loss of productivity. A Tascell worker spawns a "real" task only when requested by another idle worker. The worker performs the spawning by temporarily "backtracking" and restoring its oldest task-spawnable state. Our approach eliminates the cost of spawning/managing logical threads. It also promotes the reuse of workspaces and improves the locality of reference since it does not need to prepare a workspace for each concurrently runnable logical thread. Furthermore, Tascell enables elegant and highly-efficient backtrack search algorithms with delayed workspace copying. For instance, our 16-queens problem solver is 1.86 times faster than Cilk on a system with two dual-core processors. Our approach also enables a single program to run in both shared and distributed memory environments with reasonable efficiency and scalability.