Provably efficient scheduling for languages with fine-grained parallelism

Authors:
Guy E. Blelloch;Phillip B. Gibbons;Yossi Matias
Affiliations:
Carnegie Mellon Univ., Pittsburgh, PA;Bell Labs, Murray Hill, NJ;Bell Labs, Murray Hill, NJ
Venue:
Journal of the ACM (JACM)
Year:
1999

Citing 47
Cited 27

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
A communication-time tradeoff

SIAM Journal on Computing
Control of parallelism in the Manchester Dataflow Machine

Proc. of a conference on Functional programming languages and computer architecture
An overview for the PTRAN analysis system for multiprocessing

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Resource requirements of dataflow programs

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Towards an architecture-independent analysis of parallel algorithms

STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
A new pebble game that characterizes parallel complexity classes

SIAM Journal on Computing
I-structures: data structures for parallel computing

ACM Transactions on Programming Languages and Systems (TOPLAS)
Scans as Primitive Parallel Operations

IEEE Transactions on Computers
Introduction to algorithms

Introduction to algorithms
Vector models for data-parallel computing

Vector models for data-parallel computing
A report on the Sisal language project

Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Applications of UET scheduling theory to the implementation of declarative languages

The Computer Journal
Converting high probability into nearly-constant time—with applications to parallel hashing

STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
How to emulate shared memory

Journal of Computer and System Sciences
General purpose parallel architectures

Handbook of theoretical computer science (vol. A)
Dynamic Processor Self-Scheduling for General Parallel Nested Loops

IEEE Transactions on Computers
Towards a theory of nearly constant time parallel algorithms

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Using approximation algorithms to design parallel algorithms that may ignore processor allocation (preliminary version)

SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Fast hashing on a PRAM—designing by expectation

SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
A foundation for an efficient multi-threaded scheme system

LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
Low-overhead scheduling of nested parallelism

IBM Journal of Research and Development
Space-efficient scheduling of multithreaded computations

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Implementation of a portable nested data-parallel language

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Efficient compilation of high-level data parallel algorithms

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Randomized routing and sorting on fixed-connection networks

Journal of Algorithms
Renaming and dispersing: techniques for fast load balancing

Journal of Parallel and Distributed Computing
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelism in sequential functional languages

FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
A provable time and space efficient implementation of NESL

Proceedings of the first ACM SIGPLAN international conference on Functional programming
An analysis of dag-consistent distributed shared-memory algorithms

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Guaranteeing Good Memory Bounds for Parallel Programs

IEEE Transactions on Software Engineering
An effective load balancing policy for geometric-decaying algorithms

Journal of Parallel and Distributed Computing
Space-efficient scheduling of parallelism with synchronization variables

Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Space-efficient implementation of nested parallelism

PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling threads for low space requirement and good locality

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Fast deterministic processor allocation

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Optimal parallel approximation for prefix sums and integer sorting

SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
The Parallel Evaluation of General Arithmetic Expressions

Journal of the ACM (JACM)
On Time Versus Space

Journal of the ACM (JACM)
The Paralation Model: Architecture-Independent Parallel Programming

The Paralation Model: Architecture-Independent Parallel Programming
Synthesis of Parallel Algorithms

Synthesis of Parallel Algorithms
Storage Management in Virtual Tree Machines

IEEE Transactions on Computers
Parallel Dictionaries in 2-3 Trees

Proceedings of the 10th Colloquium on Automata, Languages and Programming
Executing functional programs on a virtual tree of processors

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
The incremental garbage collection of processes

Proceedings of the 1977 symposium on Artificial intelligence and programming languages
Optimal deterministic approximate parallel prefix sums and their applications

ISTCS '95 Proceedings of the 3rd Israel Symposium on the Theory of Computing Systems (ISTCS'95)

Low-contention depth-first scheduling of parallel computations with write-once synchronization variables

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
On bounding time and space for multiprocessor garbage collection

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Effectively sharing a cache among threads

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Value-maximizing deadline scheduling and its application to animation rendering

Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Adaptive scheduling with parallelism feedback

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The cache complexity of multithreaded cache oblivious algorithms

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Parallel depth first vs. work stealing schedulers on CMP architectures

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Adaptive work stealing with parallelism feedback

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling threads for constructive cache sharing on CMPs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Cost semantics for space usage in a parallel language

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Provably good multicore cache performance for divide-and-conquer algorithms

Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Cache-efficient dynamic programming algorithms for multicores

Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Adaptive work-stealing with parallelism feedback

ACM Transactions on Computer Systems (TOCS)
Space profiling for parallel functional programs

Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Dependency-aware reordering for parallelizing query optimization in multi-core CPUs

Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures

Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Flexible architectural support for fine-grain scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Provably efficient two-level adaptive scheduling

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Low depth cache-oblivious algorithms

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Space profiling for parallel functional programs

Journal of Functional Programming
Scheduling task parallelism on multi-socket multicore systems

Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
OpenMP task scheduling strategies for multicore NUMA systems

International Journal of High Performance Computing Applications
Vectorisation avoidance

Proceedings of the 2012 Haskell Symposium
Characterizing and mitigating work time inflation in task parallel programs

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
An architecture for P2P bag-of-tasks execution with multiple task allocation policies in desktop grids

Cluster Computing
Program-centric cost models for locality

Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Characterizing and mitigating work time inflation in task parallel programs

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program.This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any computation with w units of work and critical path length d, and for any sequential schedule that takes space s1, we provide a parallel schedule that takes fewer than w/p + d steps on p processors and requires less than s1 + p˙d space. This matches the lower bound that we show, and significantly improves upon the best previous bound of s1˙p spaces for the common case where ds1.The paper then describes a scheduler for implementing high-level languages with nested parallelism, that generates schedules in this class. During program execution, as the structure of the computation is revealed, the scheduler keeps track of the active tasks, allocates the tasks to the processors, and performs the necessary task synchronization. The scheduler is itself a parallel algorithm, and incurs at most a constant factor overhead in time and space, even when the scheduling granularity is individual units of work. The algorithm is the first efficient solution to the scheduling problem discussed here, even if space considerations are ignored.