Oracle scheduling: controlling granularity in implicitly parallel languages

Authors:
Umut A. Acar;Arthur Charguéraud;Mike Rainey
Affiliations:
Max Planck Institute for Software Systems, Kaiserslautern, Germany;Max Planck Institute for Software Systems, Kaiserslautern, Germany;Max Planck Institute for Software Systems, Kaiserslautern, Germany
Venue:
Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Year:
2011

Citing 33
Cited 1

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel execution of LISP programs

Parallel execution of LISP programs
Compiling collection-oriented languages onto massively parallel computers

Journal of Parallel and Distributed Computing - Massively parallel computation
A bridging model for parallel computation

Communications of the ACM
Lazy task creation: a technique for increasing the granularity of parallel programs

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Automatic complexity analysis

FPCA '89 Proceedings of the fourth international conference on Functional programming languages and computer architecture
Low-cost process creation and dynamic partitioning in Qlisp

Proceedings of the US/Japan workshop on Parallel Lisp on Parallel Lisp: languages and systems
Implementation of a portable nested data-parallel language

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Using the run-time sizes of data structures to guide parallel-thread creation

LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelism in sequential functional languages

FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
A provable time and space efficient implementation of NESL

Proceedings of the first ACM SIGPLAN international conference on Functional programming
A methodology for granularity-based control of parallelism in logic programs

Journal of Symbolic Computation - Special issue on parallel symbolic computation
Auto-blocking matrix-multiplication or tracking BLAS3 performance from source code

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The Parallel Evaluation of General Arithmetic Expressions

Journal of the ACM (JACM)
Resource bound certification

Proceedings of the 27th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A Message Passing Implementation of Lazy Task Creation

Proceedings of the US/Japan Workshop on Parallel Symbolic Computing: Languages, Systems, and Applications
Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors

Segmented Operations for Sparse Matrix Computation on Vector Multiprocessors
Measuring empirical computational complexity

Proceedings of the the 6th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on The foundations of software engineering
Scheduling multithreaded computations by work stealing

SFCS '94 Proceedings of the 35th Annual Symposium on Foundations of Computer Science
A scheduling framework for general-purpose parallel languages

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Space profiling for parallel functional programs

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
SPEED: precise and efficient static estimation of program computational complexity

Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Harnessing the Multicores: Nested Data Parallelism in Haskell

APLAS '08 Proceedings of the 6th Asian Symposium on Programming Languages and Systems
Backtracking-based load balancing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Static determination of quantitative resource usage for higher-order programs

Proceedings of the 37th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scheduling deterministric parallel programs

Scheduling deterministric parallel programs
Flexible architectural support for fine-grain scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Lazy tree splitting

Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Implicitly threaded parallelism in manticore

Journal of Functional Programming
Effective scheduling techniques for high-level parallel programming languages

Effective scheduling techniques for high-level parallel programming languages

Æminium: A Permission-Based Concurrent-by-Default Programming Language Approach

ACM Transactions on Programming Languages and Systems (TOPLAS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A classic problem in parallel computing is determining whether to execute a task in parallel or sequentially. If small tasks are executed in parallel, the task-creation overheads can be overwhelming. If large tasks are executed sequentially, processors may spin idle. This granularity problem, however well known, is not well understood: broadly applicable solutions remain elusive. We propose techniques for controlling granularity in implicitly parallel programming languages. Using a cost semantics for a general-purpose language in the style of the lambda calculus with support for parallelism, we show that task-creation overheads can indeed slow down parallel execution by a multiplicative factor. We then propose oracle scheduling, a technique for reducing these overheads, which bases granularity decisions on estimates of task-execution times. We prove that, for a class of computations, oracle scheduling can reduce task creation overheads to a small fraction of the work without adversely affecting available parallelism, thereby leading to efficient parallel executions. We realize oracle scheduling in practice by a combination of static and dynamic techniques. We require the programmer to provide the asymptotic complexity of every function and use run-time profiling to determine the implicit, architecture-specific constant factors. In our experiments, we were able to reduce overheads of parallelism down to between 3 and 13 percent, while achieving 6- to 10-fold speedups.