MULTILISP: a language for concurrent symbolic computation
ACM Transactions on Programming Languages and Systems (TOPLAS)
SIAM Journal on Computing
Control of parallelism in the Manchester Dataflow Machine
Proc. of a conference on Functional programming languages and computer architecture
An overview for the PTRAN analysis system for multiprocessing
Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
Resource requirements of dataflow programs
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Towards an architecture-independent analysis of parallel algorithms
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
A new pebble game that characterizes parallel complexity classes
SIAM Journal on Computing
I-structures: data structures for parallel computing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Scans as Primitive Parallel Operations
IEEE Transactions on Computers
Introduction to algorithms
Vector models for data-parallel computing
Vector models for data-parallel computing
A report on the Sisal language project
Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Applications of UET scheduling theory to the implementation of declarative languages
The Computer Journal
Converting high probability into nearly-constant time—with applications to parallel hashing
STOC '91 Proceedings of the twenty-third annual ACM symposium on Theory of computing
Journal of Computer and System Sciences
General purpose parallel architectures
Handbook of theoretical computer science (vol. A)
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
IEEE Transactions on Computers
Towards a theory of nearly constant time parallel algorithms
SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
SFCS '91 Proceedings of the 32nd annual symposium on Foundations of computer science
Fast hashing on a PRAM—designing by expectation
SODA '91 Proceedings of the second annual ACM-SIAM symposium on Discrete algorithms
A foundation for an efficient multi-threaded scheme system
LFP '92 Proceedings of the 1992 ACM conference on LISP and functional programming
Low-overhead scheduling of nested parallelism
IBM Journal of Research and Development
Space-efficient scheduling of multithreaded computations
STOC '93 Proceedings of the twenty-fifth annual ACM symposium on Theory of computing
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Efficient compilation of high-level data parallel algorithms
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Randomized routing and sorting on fixed-connection networks
Journal of Algorithms
Renaming and dispersing: techniques for fast load balancing
Journal of Parallel and Distributed Computing
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Parallelism in sequential functional languages
FPCA '95 Proceedings of the seventh international conference on Functional programming languages and computer architecture
A provable time and space efficient implementation of NESL
Proceedings of the first ACM SIGPLAN international conference on Functional programming
An analysis of dag-consistent distributed shared-memory algorithms
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Guaranteeing Good Memory Bounds for Parallel Programs
IEEE Transactions on Software Engineering
An effective load balancing policy for geometric-decaying algorithms
Journal of Parallel and Distributed Computing
Space-efficient scheduling of parallelism with synchronization variables
Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures
Space-efficient implementation of nested parallelism
PPOPP '97 Proceedings of the sixth ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling threads for low space requirement and good locality
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Fast deterministic processor allocation
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Optimal parallel approximation for prefix sums and integer sorting
SODA '94 Proceedings of the fifth annual ACM-SIAM symposium on Discrete algorithms
The Parallel Evaluation of General Arithmetic Expressions
Journal of the ACM (JACM)
Journal of the ACM (JACM)
The Paralation Model: Architecture-Independent Parallel Programming
The Paralation Model: Architecture-Independent Parallel Programming
Synthesis of Parallel Algorithms
Synthesis of Parallel Algorithms
Storage Management in Virtual Tree Machines
IEEE Transactions on Computers
Parallel Dictionaries in 2-3 Trees
Proceedings of the 10th Colloquium on Automata, Languages and Programming
Executing functional programs on a virtual tree of processors
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
The incremental garbage collection of processes
Proceedings of the 1977 symposium on Artificial intelligence and programming languages
Optimal deterministic approximate parallel prefix sums and their applications
ISTCS '95 Proceedings of the 3rd Israel Symposium on the Theory of Computing Systems (ISTCS'95)
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
On bounding time and space for multiprocessor garbage collection
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Effectively sharing a cache among threads
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Value-maximizing deadline scheduling and its application to animation rendering
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Adaptive scheduling with parallelism feedback
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
The cache complexity of multithreaded cache oblivious algorithms
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Parallel depth first vs. work stealing schedulers on CMP architectures
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Adaptive work stealing with parallelism feedback
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Scheduling threads for constructive cache sharing on CMPs
Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Cost semantics for space usage in a parallel language
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
Provably good multicore cache performance for divide-and-conquer algorithms
Proceedings of the nineteenth annual ACM-SIAM symposium on Discrete algorithms
Cache-efficient dynamic programming algorithms for multicores
Proceedings of the twentieth annual symposium on Parallelism in algorithms and architectures
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS)
Space profiling for parallel functional programs
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Dependency-aware reordering for parallelizing query optimization in multi-core CPUs
Proceedings of the 2009 ACM SIGMOD International Conference on Management of data
Beyond nested parallelism: tight bounds on work-stealing overheads for parallel futures
Proceedings of the twenty-first annual symposium on Parallelism in algorithms and architectures
Flexible architectural support for fine-grain scheduling
Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Provably efficient two-level adaptive scheduling
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Low depth cache-oblivious algorithms
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Space profiling for parallel functional programs
Journal of Functional Programming
Scheduling task parallelism on multi-socket multicore systems
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
OpenMP task scheduling strategies for multicore NUMA systems
International Journal of High Performance Computing Applications
Proceedings of the 2012 Haskell Symposium
Characterizing and mitigating work time inflation in task parallel programs
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Program-centric cost models for locality
Proceedings of the ACM SIGPLAN Workshop on Memory Systems Performance and Correctness
Characterizing and mitigating work time inflation in task parallel programs
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Many high-level parallel programming languages allow for fine-grained parallelism. As in the popular work-time framework for parallel algorithm design, programs written in such languages can express the full parallelism in the program without specifying the mapping of program tasks to processors. A common concern in executing such programs is to schedule tasks to processors dynamically so as to minimize not only the execution time, but also the amount of space (memory) needed. Without careful scheduling, the parallel execution on p processors can use a factor of p or larger more space than a sequential implementation of the same program.This paper first identifies a class of parallel schedules that are provably efficient in both time and space. For any computation with w units of work and critical path length d, and for any sequential schedule that takes space s1, we provide a parallel schedule that takes fewer than w/p + d steps on p processors and requires less than s1 + p˙d space. This matches the lower bound that we show, and significantly improves upon the best previous bound of s1˙p spaces for the common case where ds1.The paper then describes a scheduler for implementing high-level languages with nested parallelism, that generates schedules in this class. During program execution, as the structure of the computation is revealed, the scheduler keeps track of the active tasks, allocates the tasks to the processors, and performs the necessary task synchronization. The scheduler is itself a parallel algorithm, and incurs at most a constant factor overhead in time and space, even when the scheduling granularity is individual units of work. The algorithm is the first efficient solution to the scheduling problem discussed here, even if space considerations are ignored.