Space-efficient scheduling of nested parallelism

Authors:
Girija J. Narlikar;Guy E. Blelloch
Affiliations:
Carnegie Mellon Univ., Pittsburgh, PA;Carnegie Mellon Univ., Pittsburgh, PA
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
1999

Citing 30
Cited 17

MULTILISP: a language for concurrent symbolic computation

ACM Transactions on Programming Languages and Systems (TOPLAS)
A taxonomy of problems with fast parallel algorithms

Information and Control
Control of parallelism in the Manchester Dataflow Machine

Proc. of a conference on Functional programming languages and computer architecture
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers

IEEE Transactions on Computers
PRESTO: a system for object-oriented parallel programming

Software—Practice & Experience
Resource requirements of dataflow programs

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Workcrews: an abstraction for controlling parallelism

International Journal of Parallel Programming
I-structures: data structures for parallel computing

ACM Transactions on Programming Languages and Systems (TOPLAS)
The Amber system: parallel programming on a network of multiprocessors

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
A report on the Sisal language project

Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Switch-stacks: a scheme for microtasking nested parallel loops

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Factoring: a method for scheduling parallel loops

Communications of the ACM
Low-overhead scheduling of nested parallelism

IBM Journal of Research and Development
Computation migration: enhancing locality for distributed-memory parallel systems

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Space-efficient scheduling of multithreaded computations

STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Implementation of a portable nested data-parallel language

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Supporting dynamic data structures on distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Provably efficient scheduling for languages with fine-grained parallelism

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
An analysis of dag-consistent distributed shared-memory algorithms

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Pthreads for dynamic and irregular parallelism

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Jade: A High-Level, Machine-Independent Language for Parallel Programming

Computer
COOL: An Object-Based Language for Parallel Programming

Computer
Storage Management in Virtual Tree Machines

IEEE Transactions on Computers
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs

IEEE Transactions on Parallel and Distributed Systems
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers

IEEE Transactions on Parallel and Distributed Systems
Induction of Decision Trees

Machine Learning
Compositional C++: Compositional Parallel Programming

Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Executing functional programs on a virtual tree of processors

FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture

Hoard: a scalable memory allocator for multithreaded applications

ACM SIGPLAN Notices
Hoard: a scalable memory allocator for multithreaded applications

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Effectively sharing a cache among threads

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Data dependent loop scheduling based on genetic algorithms for distributed and shared memory systems

Journal of Parallel and Distributed Computing
Adaptive scheduling with parallelism feedback

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The cache complexity of multithreaded cache oblivious algorithms

Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Adaptive work stealing with parallelism feedback

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling threads for constructive cache sharing on CMPs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Cost semantics for space usage in a parallel language

Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Adaptive work-stealing with parallelism feedback

ACM Transactions on Computer Systems (TOCS)
Improved results for scheduling batched parallel jobs by using a generalized analysis framework

Journal of Parallel and Distributed Computing
Provably efficient two-level adaptive scheduling

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Lazy tree splitting

Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Space-efficient scheduling of stochastically generated tasks

ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
Dynamic workload balancing deques for branch and bound algorithms in the message passing interface

International Journal of High Performance Systems Architecture
Space-efficient scheduling of stochastically generated tasks

Information and Computation
Fast and lightweight support for nested parallelism on cluster-based embedded many-cores

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many of today's high-level parallel languages support dynamic, fine-grained parallelism. These languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an efficient scheduling algorithm is required to assign computations to processors at runtime. Besides having low overheads and good load balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel program. This article presents an on-line scheduling algorithm that is provably space efficient and time efficient for nested-parallel languages. For a computation with depth D and serial space requirement S1, the algorithm generates a schedule that requires at most S1 + O(K•D•p) space (including scheduler space) on p processors. Here, K is a user-adjustable runtime parameter specifying the net amount of memory that a thread may allocate before it is preempted by the scheduler. Adjusting the value of K provides a trade-off between the running time and the memory requirement of a parallel computation. To allow the scheduler to scale with the number of processors we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time efficient in theory, we demonstrate that it is effective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance.