Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

Authors:
E. Mohr;D. A. Kranz;R. H. Halstead, Jr.
Affiliations:
-;-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1991

Citing 22
Cited 64

The Manchester prototype dataflow computer

Communications of the ACM - Special section on computer architecture
MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
Serial combinators: “optimal” grains of parallelism

Proc. of a conference on Functional programming languages and computer architecture
Compile-time partitioning and scheduling of parallel programs

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
ORBIT: an optimizing compiler for scheme

SIGPLAN '86 Proceedings of the 1986 SIGPLAN symposium on Compiler construction
Dataflow architectures

Annual review of computer science vol. 1, 1986
Managing stack frames in Smalltalk

SIGPLAN '87 Papers of the Symposium on Interpreters and interpretive techniques
An assessment of multilisp: lessons from experience

International Journal of Parallel Programming
Preliminary results with the initial implementation of Qlisp

LFP '88 Proceedings of the 1988 ACM conference on LISP and functional programming
Adaptive bitonic sorting: an optimal parallel algorithm for shared-memory machines

SIAM Journal on Computing
Workcrews: an abstraction for controlling parallelism

International Journal of Parallel Programming
Multiprocessor execution of functional programs

International Journal of Parallel Programming
Mul-T: a high-performance parallel Lisp

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Lazy task creation: a technique for increasing the granularity of parallel programs

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
LimitLESS directories: A scalable cache coherence scheme

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Connection Machine Lisp: fine-grained parallel symbolic processing

LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
APRIL: a processor architecture for multiprocessing

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Control of parallelism in the Manchester Dataflow Machine

Proceedings of the Functional Programming Languages and Computer Architecture
Low-Cost Process Creation and Dynamic Partitioning in Qlisp

Proceedings of the US/Japan Workshop on Parallel Lisp: Languages and Systems
Qlisp: An Interim Report

Proceedings of the US/Japan Workshop on Parallel Lisp: Languages and Systems
Queue-based multi-processing LISP

LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Orbit: an optimizing compiler for scheme

Orbit: an optimizing compiler for scheme

Chores: enhanced run-time support for shared-memory parallel computing

ACM Transactions on Computer Systems (TOCS)
Waiting algorithms for synchronization in large-scale multiprocessors

ACM Transactions on Computer Systems (TOCS)
Integrating message-passing and shared-memory: early experience

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Leapfrogging: a portable technique for implementing efficient futures

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Anatomy of a message in the Alewife multiprocessor

ICS '93 Proceedings of the 7th international conference on Supercomputing
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Software caching and computation migration in Olden

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimistic active messages: a mechanism for scheduling communication with computation

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
The MIT Alewife machine: architecture and performance

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A hybrid execution model for fine-grained languages on distributed memory multicomputers

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Ordered multithreading: a novel technique for exploiting thread-level parallelism

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
GUM: a portable parallel implementation of Haskell

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Fine-grain multithreading with minimal compiler support—a cost effective approach to implementing efficient multithreading languages

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The MIT Alewife machine: architecture and performance

25 years of the international symposia on Computer architecture (selected papers)
StackThreads/MP: integrating futures into calling standards

Proceedings of the seventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Space-efficient scheduling of nested parallelism

ACM Transactions on Programming Languages and Systems (TOPLAS)
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
The data locality of work stealing

Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Optimal tiling for minimizing communication in distributed shared-memory multiprocessors

Compiler optimizations for scalable parallel systems
Supporting dynamic data structures with Olden

Compiler optimizations for scalable parallel systems
A hierarchical load-balancing framework for dynamic multithreaded computations

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Pthreads for dynamic and irregular parallelism

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Evaluating the performance limitations of MPMD communication

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Load balancing in a parallel graph reducer

Trends in functional programming
Designing Scalable Object Oriented Parallel Applications (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
A Run-Time System for Dynamic Grain Packing

Euro-Par '99 Proceedings of the 5th International Euro-Par Conference on Parallel Processing
The Multi-architecture Performance of the Parallel Functional Language GP H (Research Note)

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
A SCOOPP Evaluation on Packing Parallel Objects in Run-Time

VECPAR '00 Selected Papers and Invited Talks from the 4th International Conference on Vector and Parallel Processing
Performance Evaluation of OpenMP Applications with Nested Parallelism

LCR '00 Selected Papers from the 5th International Workshop on Languages, Compilers, and Run-Time Systems for Scalable Computers
Phoenix: a parallel programming model for accommodating dynamically joining/leaving resources

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Comparing Parallel Functional Languages: Programming and Performance

Higher-Order and Symbolic Computation
Optimistic evaluation: an adaptive evaluation strategy for non-strict programs

ICFP '03 Proceedings of the eighth ACM SIGPLAN international conference on Functional programming
Parallel and Distributed Haskells

Journal of Functional Programming
Algorithm + strategy = parallelism

Journal of Functional Programming
EQUALS – a fast parallel implementation of a lazy language

Journal of Functional Programming
Transparent proxies for java futures

OOPSLA '04 Proceedings of the 19th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Hardware and Binary Modification Support for Code Pointer Protection From Buffer Overflow

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Supporting exception handling for futures in Java

Proceedings of the 5th international symposium on Principles and practice of programming in Java
Backtracking-based load balancing

Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
As-if-serial exception handling semantics for Java futures

Science of Computer Programming
Parallel performance tuning for Haskell

Proceedings of the 2nd ACM SIGPLAN symposium on Haskell
Experience with SC: transformation-based implementation of various extensions to C

Proceedings of the 2007 International Lisp Conference
Sort vs. Hash revisited: fast join implementation on modern multi-core CPUs

Proceedings of the VLDB Endowment
Efficient shared-memory support for parallel graph reduction

Future Generation Computer Systems
An adaptive task creation strategy for work-stealing scheduling

Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
A tutorial on parallel and concurrent programming in Haskell

AFP'08 Proceedings of the 6th international conference on Advanced functional programming
Hardware/software support for adaptive work-stealing in on-chip multiprocessor

Journal of Systems Architecture: the EUROMICRO Journal
Granularity-Aware Work-Stealing for Computationally-Uniform Grids

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Seq no more: better strategies for parallel Haskell

Proceedings of the third ACM Haskell symposium on Haskell
Lifeline-based global load balancing

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
AC: composable asynchronous IO for native languages

Proceedings of the 2011 ACM international conference on Object oriented programming systems languages and applications
Adaptive encoding of multimedia streams on MPSoC

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Tying memory management to parallel programming models

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
BWS: balanced work stealing for time-sharing multicores

Proceedings of the 7th ACM european conference on Computer Systems
Lightweight lexical closures for legitimate execution stack access

CC'06 Proceedings of the 15th international conference on Compiler Construction
Haskell vs. f# vs. scala: a high-level language features and parallelism support comparison

Proceedings of the 1st ACM SIGPLAN workshop on Functional high-performance computing
Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Parallelism granules aggregation with the T-system

PaCT'07 Proceedings of the 9th international conference on Parallel Computing Technologies
Adaptive granularity control in task parallel programs using multiversioning

Euro-Par'13 Proceedings of the 19th international conference on Parallel Processing
Streaming similarity search over one billion tweets using parallel locality-sensitive hashing

Proceedings of the VLDB Endowment

Quantified Score

Hi-index	0.00

Visualization

Abstract

When a parallel algorithm is written naturally, the resulting program often produces tasks of a finer grain than an implementation can exploit efficiently. Two solutions to the granularity problem that combine parallel tasks dynamically at runtime are discussed. The simpler load-based inlining method, in which tasks are combined based on dynamic bad level, is rejected in favor of the safer and more robust lazy task creation method, in which tasks are created only retroactively as processing results become available. The strategies grew out of work on Mul-T, an efficient parallel implementation of Scheme, but could be used with other languages as well. Mul-T implementations of lazy task creation are described for two contrasting machines, and performance statistics that show the method's effectiveness are presented. Lazy task creation is shown to allow efficient execution of naturally expressed algorithms of a substantially finer grain than possible with previous parallel Lisp systems.