MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
A taxonomy of problems with fast parallel algorithms
Information and Control
Control of parallelism in the Manchester Dataflow Machine
Proc. of a conference on Functional programming languages and computer architecture
Guided self-scheduling: A practical scheduling scheme for parallel supercomputers
IEEE Transactions on Computers
PRESTO: a system for object-oriented parallel programming
Software—Practice & Experience
Resource requirements of dataflow programs
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Workcrews: an abstraction for controlling parallelism
International Journal of Parallel Programming
I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Amber system: parallel programming on a network of multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
A report on the Sisal language project
Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Switch-stacks: a scheme for microtasking nested parallel loops
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Factoring: a method for scheduling parallel loops
Communications of the ACM
Low-overhead scheduling of nested parallelism
IBM Journal of Research and Development
Computation migration: enhancing locality for distributed-memory parallel systems
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Space-efficient scheduling of multithreaded computations
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Supporting dynamic data structures on distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Provably efficient scheduling for languages with fine-grained parallelism
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
An analysis of dag-consistent distributed shared-memory algorithms
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Pthreads for dynamic and irregular parallelism
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Storage Management in Virtual Tree Machines
IEEE Transactions on Computers
Lazy Task Creation: A Technique for Increasing the Granularity of Parallel Programs
IEEE Transactions on Parallel and Distributed Systems
Trapezoid Self-Scheduling: A Practical Scheduling Scheme for Parallel Compilers
IEEE Transactions on Parallel and Distributed Systems
Machine Learning
Compositional C++: Compositional Parallel Programming
Proceedings of the 5th International Workshop on Languages and Compilers for Parallel Computing
Cid: A Parallel, "Shared-Memory" C for Distributed-Memory Machines
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Executing functional programs on a virtual tree of processors
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Hoard: a scalable memory allocator for multithreaded applications
ACM SIGPLAN Notices
Hoard: a scalable memory allocator for multithreaded applications
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Data dependent loop scheduling based on genetic algorithms for distributed and shared memory systems
Journal of Parallel and Distributed Computing
Adaptive scheduling with parallelism feedback
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The cache complexity of multithreaded cache oblivious algorithms
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Adaptive work stealing with parallelism feedback
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling threads for constructive cache sharing on CMPs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Cost semantics for space usage in a parallel language
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS)
Improved results for scheduling batched parallel jobs by using a generalized analysis framework
Journal of Parallel and Distributed Computing
Provably efficient two-level adaptive scheduling
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Proceedings of the 15th ACM SIGPLAN international conference on Functional programming
Space-efficient scheduling of stochastically generated tasks
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
Dynamic workload balancing deques for branch and bound algorithms in the message passing interface
International Journal of High Performance Systems Architecture
Space-efficient scheduling of stochastically generated tasks
Information and Computation
Fast and lightweight support for nested parallelism on cluster-based embedded many-cores
DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Hi-index | 0.00 |
Many of today's high-level parallel languages support dynamic, fine-grained parallelism. These languages allow the user to expose all the parallelism in the program, which is typically of a much higher degree than the number of processors. Hence an efficient scheduling algorithm is required to assign computations to processors at runtime. Besides having low overheads and good load balancing, it is important for the scheduling algorithm to minimize the space usage of the parallel program. This article presents an on-line scheduling algorithm that is provably space efficient and time efficient for nested-parallel languages. For a computation with depth D and serial space requirement S1, the algorithm generates a schedule that requires at most S1 + O(K•D•p) space (including scheduler space) on p processors. Here, K is a user-adjustable runtime parameter specifying the net amount of memory that a thread may allocate before it is preempted by the scheduler. Adjusting the value of K provides a trade-off between the running time and the memory requirement of a parallel computation. To allow the scheduler to scale with the number of processors we also parallelize the scheduler and analyze the space and time bounds of the computation to include scheduling costs. In addition to showing that the scheduling algorithm is space and time efficient in theory, we demonstrate that it is effective in practice. We have implemented a runtime system that uses our algorithm to schedule lightweight parallel threads. The results of executing parallel programs on this system show that our scheduling algorithm significantly reduces memory usage compared to previous techniques, without compromising performance.