BOS is boss: a case for bulk-synchronous object systems
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
An operational semantics for parallel lazy evaluation
ICFP '00 Proceedings of the fifth ACM SIGPLAN international conference on Functional programming
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Methodology of Parallelization for Continuous Verified Global Optimization
PPAM '01 Proceedings of the th International Conference on Parallel Processing and Applied Mathematics-Revised Papers
A New Scheduling Algorithm for General Strict Multithreaded Computations
Proceedings of the 13th International Symposium on Distributed Computing
Computer algebra handbook
Value-maximizing deadline scheduling and its application to animation rendering
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Adaptive scheduling with parallelism feedback
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive work stealing with parallelism feedback
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Re-scheduling invocations of services for RPC grids
Computer Languages, Systems and Structures
Cost semantics for space usage in a parallel language
Proceedings of the 2007 workshop on Declarative aspects of multicore programming
KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors
Proceedings of the 2007 international workshop on Parallel symbolic computation
Productivity and performance using partitioned global address space languages
Proceedings of the 2007 international workshop on Parallel symbolic computation
Multi-threading and one-sided communication in parallel LU factorization
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS)
Space profiling for parallel functional programs
Proceedings of the 13th ACM SIGPLAN international conference on Functional programming
Improved results for scheduling batched parallel jobs by using a generalized analysis framework
Journal of Parallel and Distributed Computing
The Cilk++ concurrency platform
The Journal of Supercomputing
Provably efficient two-level adaptive scheduling
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Scheduling dynamically spawned processes in MPI-2
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
The Cilkview scalability analyzer
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Area-maximizing schedules for series-parallel DAGs
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Space profiling for parallel functional programs
Journal of Functional Programming
Adaptive encoding of multimedia streams on MPSoC
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Time complexity of distributed topological self-stabilization: the case of graph linearization
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
On-the-fly pipeline parallelism
Proceedings of the twenty-fifth annual ACM symposium on Parallelism in algorithms and architectures
Well-structured futures and cache locality
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
This paper considers the problem of scheduling dynamic parallel computations to achieve linear speedup without using significantly more space per processor than that required for a single-processor execution. Utilizing a new graph-theoretic model of multithreaded computation, execution efficiency is quantified by three important measures: T1 is the time required for executing the computation on a 1 processor, $T_\infty$ is the time required by an infinite number of processors, and S1 is the space required to execute the computation on a 1 processor. A computation executed on P processors is time-efficient if the time is $O(T_1/P + T_\infty)$, that is, it achieves linear speedup when $P=O(T_1/T_\infty)$, and it is space-efficient if it uses O(S1P) total space, that is, the space per processor is within a constant factor of that required for a 1-processor execution.The first result derived from this model shows that there exist multithreaded computations such that no execution schedule can simultaneously achieve efficient time and efficient space. But by restricting attention to "strict" computations---those in which all arguments to a procedure must be available before the procedure can be invoked---much more positive results are obtainable. Specifically, for any strict multithreaded computation, a simple online algorithm can compute a schedule that is both time-efficient and space-efficient. Unfortunately, because the algorithm uses a global queue, the overhead of computing the schedule can be substantial. This problem is overcome by a decentralized algorithm that can compute and execute a P-processor schedule online in expected time $O(T_1/P + T_\infty\lg P)$ and worst-case space $O(S_1P\lg P)$, including overhead costs.