Lazy task creation: a technique for increasing the granularity of parallel programs

Authors:
Eric Mohr;David A. Kranz;Robert H. Halstead, Jr.
Affiliations:
Yale University;Laboratory for Computer Science, M.I.T.;DEC Cambridge Research Lab
Venue:
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
Year:
1990

Citing 18
Cited 59

The Manchester prototype dataflow computer

Communications of the ACM - Special section on computer architecture
MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Serial combinators: “optimal” grains of parallelism

Proc. of a conference on Functional programming languages and computer architecture
ORBIT: an optimizing compiler for scheme

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Dataflow architectures

Annual review of computer science vol. 1, 1986
Para-Functional Programming

Computer
Managing stack frames in Smalltalk

SIGPLAN '87 Papers of the Symposium on Interpreters and interpretive techniques
Control of parallelism in the Manchester Dataflow Machine

Proc. of a conference on Functional programming languages and computer architecture
An assessment of multilisp: lessons from experience

International Journal of Parallel Programming
Preliminary results with the initial implementation of Qlisp

LFP '88 Proceedings of the 1988 ACM conference on LISP and functional programming
Workcrews: an abstraction for controlling parallelism

International Journal of Parallel Programming
Multiprocessor execution of functional programs

International Journal of Parallel Programming
Mul-T: a high-performance parallel Lisp

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Connection Machine Lisp: fine-grained parallel symbolic processing

LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Para-functional programming: a paradigm for programming multiprocessor systems

POPL '86 Proceedings of the 13th ACM SIGACT-SIGPLAN symposium on Principles of programming languages
Queue-based multi-processing LISP

LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Orbit: an optimizing compiler for scheme

Orbit: an optimizing compiler for scheme

Fast parallel implementation of lazy languages—the EQUALS experience

LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
A foundation for an efficient multi-threaded scheme system

LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
A customizable substrate for concurrent languages

PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Heterogeneous parallel programming in Jade

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Static dependent costs for estimating execution time

LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
Using the run-time sizes of data structures to guide parallel-thread creation

LFP '94 Proceedings of the 1994 ACM conference on LISP and functional programming
The semantics of future and its use in program optimization

POPL '95 Proceedings of the 22nd ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Commutativity analysis: a new analysis framework for parallelizing compilers

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
The semantics of Scheme with future

Proceedings of the first ACM SIGPLAN international conference on Functional programming
Dynamic feedback: an effective technique for adaptive computing

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Commutativity analysis: a new analysis technique for parallelizing compilers

ACM Transactions on Programming Languages and Systems (TOPLAS)
The design, implementation, and evaluation of Jade

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic parallelization of divide and conquer algorithms

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Transparent communication for distributed objects in Java

JAVA '99 Proceedings of the ACM 1999 conference on Java Grande
Eliminating synchronization bottlenecks in object-based programs using adaptive replication

ICS '99 Proceedings of the 13th international conference on Supercomputing
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Effective fine-grain synchronization for automatically parallelized programs using optimistic synchronization primitives

ACM Transactions on Computer Systems (TOCS)
Efficient load balancing for wide-area divide-and-conquer applications

PPoPP '01 Proceedings of the eighth ACM SIGPLAN symposium on Principles and practices of parallel programming
A Syntactic Theory of Dynamic Binding

Higher-Order and Symbolic Computation
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Eliminating synchronization bottlenecks using adaptive replication

ACM Transactions on Programming Languages and Systems (TOPLAS)
Predicting Scalability of Parallel Garbage Collectors on Shared Memory Multiprocessors

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Satin: Efficient Parallel Divide-and-Conquer in Java

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Towards a Computational Model for UFO

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
Commutativity Analysis: A Technique for Automatically Parallelizing Pointer-Based Computations

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Exploiting Implicit Parallelism in Functional Programs with SLAM

IFL '00 Selected Papers from the 12th International Workshop on Implementation of Functional Languages
The semantics of future and an application

Journal of Functional Programming
Safe futures for Java

OOPSLA '05 Proceedings of the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
A unifying link abstraction for wireless sensor networks

Proceedings of the 3rd international conference on Embedded networked sensor systems
Adaptive scheduling with parallelism feedback

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive work stealing with parallelism feedback

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Manticore: a heterogeneous parallel language

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Carbon: architectural support for fine-grained parallelism on chip multiprocessors

Proceedings of the 34th annual international symposium on Computer architecture
Status report: the manticore project

ML '07 Proceedings of the 2007 workshop on Workshop on ML
Quasi-static scheduling for safe futures

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Adaptive work-stealing with parallelism feedback

ACM Transactions on Computer Systems (TOCS)
A scheduling framework for general-purpose parallel languages

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
An adaptive cut-off for task parallelism

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Efficient, portable implementation of asynchronous multi-place programs

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Evaluating OpenMP 3.0 Run Time Systems on Unbalanced Task Graphs

IWOMP '09 Proceedings of the 5th International Workshop on OpenMP: Evolving OpenMP in an Age of Extreme Parallelism
Exceptionally Safe Futures

COORDINATION '09 Proceedings of the 11th International Conference on Coordination Models and Languages
Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
ClearPath: highly parallel collision avoidance for multi-agent simulation

Proceedings of the 2009 ACM SIGGRAPH/Eurographics Symposium on Computer Animation
Lazy binary-splitting: a run-time adaptive work-stealing scheduler

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Lightweight asynchrony using parasitic threads

Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming
Satin: A high-level and efficient grid programming model

ACM Transactions on Programming Languages and Systems (TOPLAS)
Evaluation of OpenMP task scheduling strategies

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Exploiting fine-grained parallelism on cell processors

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Programming in Manticore, a heterogenous parallel functional language

CEFP'09 Proceedings of the Third summer school conference on Central European functional programming school
Implicitly threaded parallelism in manticore

Journal of Functional Programming
Dynamic workload balancing deques for branch and bound algorithms in the message passing interface

International Journal of High Performance Systems Architecture
Oracle scheduling: controlling granularity in implicitly parallel languages

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
A work-stealing scheduler for X10's task parallelism with suspension

Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming
Dependence analysis for safe futures

Science of Computer Programming
On the granularity of divide-and-conquer parallelism

FP'95 Proceedings of the 1995 international conference on Functional Programming
Design and implementation of a customizable work stealing scheduler

Proceedings of the 3rd International Workshop on Runtime and Operating Systems for Supercomputers
Energy-efficient work-stealing language runtimes

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Friendly barriers: efficient work-stealing with return barriers

Proceedings of the 10th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many parallel algorithms are naturally expressed at a fine level of granularity, often finer than a MIMD parallel system can exploit efficiently. Most builders of parallel systems have looked to either the programmer or a parallelizing compiler to increase the granularity of such algorithms. In this paper we explore a third approach to the granularity problem by analyzing two strategies for combining parallel tasks dynamically at run-time. We reject the simpler load-based inlining method, where tasks are combined based on dynamic load level, in favor of the safer and more robust lazy task creation method, where tasks are created only retroactively as processing resources become available.These strategies grew out of work on Mul-T [14], an efficient parallel implementation of Scheme, but could be used with other applicative languages as well. We describe our Mul-T implementations of lazy task creation for two contrasting machines, and present performance statistics which show the method's effectiveness. Lazy task creation allows efficient execution of naturally expressed algorithms of a substantially finer grain than possible with previous parallel Lisp systems.