MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel execution of LISP programs
Parallel execution of LISP programs
Compiling collection-oriented languages onto massively parallel computers
Journal of Parallel and Distributed Computing - Massively parallel computation
A bridging model for parallel computation
Communications of the ACM
Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
FPCA '89 Proceedings of the fourth international conference on Functional programming languages and computer architecture
Low-cost process creation and dynamic partitioning in Qlisp
Proceedings of the US/Japan workshop on Parallel Lisp on Parallel Lisp: languages and systems
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Using the run-time sizes of data structures to guide parallel-thread creation
LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelism in sequential functional languages
FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
A provable time and space efficient implementation of NESL
Proceedings of the first ACM SIGPLAN international conference on Functional programming
A methodology for granularity-based control of parallelism in logic programs
Journal of Symbolic Computation - Special issue on parallel symbolic computation
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The Parallel Evaluation of General Arithmetic Expressions
Journal of the ACM (JACM)
Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Message Passing Implementation of Lazy Task Creation
Proceedings of the US/Japan Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Measuring empirical computational complexity
Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Scheduling multithreaded computations by work stealing
SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
A scheduling framework for general-purpose parallel languages
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Space profiling for parallel functional programs
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
SPEED: precise and efficient static estimation of program computational complexity
Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Harnessing the Multicores: Nested Data Parallelism in Haskell
APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
Backtracking-based load balancing
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Static determination of quantitative resource usage for higher-order programs
Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scheduling deterministric parallel programs
Scheduling deterministric parallel programs
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Implicitly threaded parallelism in manticore
Journal of Functional Programming
Effective scheduling techniques for high-level parallel programming languages
Effective scheduling techniques for high-level parallel programming languages
Æminium: A Permission-Based Concurrent-by-Default Programming Language Approach
ACM Transactions on Programming Languages and Systems (TOPLAS)
Hi-index | 0.00 |
A classic problem in parallel computing is determining whether to execute a task in parallel or sequentially. If small tasks are executed in parallel, the task-creation overheads can be overwhelming. If large tasks are executed sequentially, processors may spin idle. This granularity problem, however well known, is not well understood: broadly applicable solutions remain elusive. We propose techniques for controlling granularity in implicitly parallel programming languages. Using a cost semantics for a general-purpose language in the style of the lambda calculus with support for parallelism, we show that task-creation overheads can indeed slow down parallel execution by a multiplicative factor. We then propose oracle scheduling, a technique for reducing these overheads, which bases granularity decisions on estimates of task-execution times. We prove that, for a class of computations, oracle scheduling can reduce task creation overheads to a small fraction of the work without adversely affecting available parallelism, thereby leading to efficient parallel executions. We realize oracle scheduling in practice by a combination of static and dynamic techniques. We require the programmer to provide the asymptotic complexity of every function and use run-time profiling to determine the implicit, architecture-specific constant factors. In our experiments, we were able to reduce overheads of parallelism down to between 3 and 13 percent, while achieving 6- to 10-fold speedups.