MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Mul-T: a high-performance parallel Lisp
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
A report on the Sisal language project
Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Space-efficient scheduling of multithreaded computations
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
An analysis of dag-consistent distributed shared-memory algorithms
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Lazy threads: implementing a fast parallel call
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Space-efficient scheduling of parallelism with synchronization variables
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Provably efficient scheduling for languages with fine-grained parallelism
Journal of the ACM (JACM)
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
Proceedings of the ACM 2000 conference on Java Grande
The Definition of Standard ML
Implementation of multilisp: Lisp on a multiprocessor
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Executing functional programs on a virtual tree of processors
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Whole-program compilation in MLton
Proceedings of the 2006 workshop on ML
Exploiting coarse-grained task, data, and pipeline parallelism in stream programs
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Deadlock-free scheduling of X10 computations with bounded resources
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Space profiling for parallel functional programs
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Solving Large, Irregular Graph Problems Using Adaptive Work-Stealing
ICPP '08 Proceedings of the 2008 37th International Conference on Parallel Processing
Implicitly threaded parallelism in manticore
Journal of Functional Programming
A work-stealing scheduler for X10's task parallelism with suspension
Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
A meta-scheduler for the par-monad: composable scheduling for the heterogeneous cloud
Proceedings of the 17th ACM SIGPLAN international conference on Functional programming
LVars: lattice-based data structures for deterministic parallelism
Proceedings of the 2nd ACM SIGPLAN workshop on Functional high-performance computing
Well-structured futures and cache locality
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
Work stealing is a popular method of scheduling fine-grained parallel tasks. The performance of work stealing has been extensively studied, both theoretically and empirically, but primarily for the restricted class of nested-parallel (or fully strict) computations. We extend this prior work by considering a broader class of programs that also supports pipelined parallelism through the use of parallel futures. Though the overhead of work-stealing schedulers is often quantified in terms of the number of steals, we show that a broader metric, the number of deviations, is a better way to quantify work-stealing overhead for less restrictive forms of parallelism, including parallel futures. For such parallelism, we prove bounds on work-stealing overheads--scheduler time and cache misses--as a function of the number of deviations. Deviations can occur, for example, when work is stolen or when a future is touched. We also show instances where deviations can occur independently of steals and touches. Next, we prove that, under work stealing, the expected number of deviations is O(Pd + td) in a P-processor execution of a computation with span d and t touches of futures. Moreover, this bound is existentially tight for any work-stealing scheduler that is parsimonious (those where processors steal only when their queues are empty); this class includes all prior work-stealing schedulers. We also present empirical measurements of the number of deviations incurred by a classic application of futures, Halstead's quicksort, using our parallel implementation of ML. Finally, we identify a family of applications that use futures and, in contrast to quicksort, incur significantly smaller overheads.