DIB—a distributed implementation of backtracking
ACM Transactions on Programming Languages and Systems (TOPLAS)
A randomized parallel branch-and-bound procedure
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Speedup Versus Efficiency in Parallel Systems
IEEE Transactions on Computers
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Characterizations of parallelism in applications and their use in scheduling
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
A simple load balancing scheme for task allocation in parallel machines
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
The Processor Working Set and its Use in Scheduling Multiprocessor Systems
IEEE Transactions on Software Engineering
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
IEEE Transactions on Computers
Low-overhead scheduling of nested parallelism
IBM Journal of Research and Development
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Journal of the ACM (JACM)
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Randomized algorithms
Provably efficient scheduling for languages with fine-grained parallelism
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Elimination trees and the construction of pools and stacks: preliminary version
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A provable time and space efficient implementation of NESL
Proceedings of the first ACM SIGPLAN international conference on Functional programming
ACM Transactions on Computer Systems (TOCS)
On multiprocessor system scheduling
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
Using parallel program characteristics in dynamic processor allocation policies
Performance Evaluation
Space-Efficient Scheduling of Multithreaded Computations
SIAM Journal on Computing
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The performance of work stealing in multiprogrammed environments (extended abstract)
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Provably efficient scheduling for languages with fine-grained parallelism
Journal of the ACM (JACM)
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Space-efficient scheduling of nested parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Parallel Evaluation of General Arithmetic Expressions
Journal of the ACM (JACM)
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
The data locality of work stealing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Introduction to Algorithms
Maximizing Speedup through Self-Tuning of Processor Allocation
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Multiprocessor Scheduling for High-Variability Service Time Distributions
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Using Runtime Measured Workload Characteristics in Parallel Processor Scheduling
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Dynamic vs. Static Quantum-Based Parallel Processor Allocation
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Implementation of multilisp: Lisp on a multiprocessor
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Executing functional programs on a virtual tree of processors
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
Non-clair voy ant multiprocessor scheduling of jobs with changing execution characteristics
Journal of Scheduling - Special issue: On-line scheduling
The counting pyramid: an adaptive distributed counting scheme
Journal of Parallel and Distributed Computing
Adaptive scheduling with parallelism feedback
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
An Empirical Evaluation ofWork Stealing with Parallelism Feedback
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Provably efficient two-level adaptive scheduling
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS)
Fine-Grained Task Scheduling Using Adaptive Data Structures
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Improved results for scheduling batched parallel jobs by using a generalized analysis framework
Journal of Parallel and Distributed Computing
Lazy binary-splitting: a run-time adaptive work-stealing scheduler
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Lightweight asynchrony using parasitic threads
Proceedings of the 5th ACM SIGPLAN workshop on Declarative aspects of multicore programming
Performance driven distributed scheduling of parallel hybrid computations
Theoretical Computer Science
Energy-efficient work-stealing language runtimes
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Well-structured futures and cache locality
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
We present an adaptive work-stealing thread scheduler, A-Steal, for fork-join multithreaded jobs, like those written using the Cilk multithreaded language or the Hood work-stealing library. The A-Steal algorithm is appropriate for large parallel servers where many jobs share a common multiprocessor resource and in which the number of processors available to a particular job may vary during the job's execution. A-Steal provides continual parallelism feedback to a job scheduler in the form of processor requests, and the job must adaptits execution to the processors allotted to it. Assuming that the job scheduler never allots any job more processors than requested by thejob's thread scheduler, A-Steal guarantees that the job completes in near-optimal time while utilizing at least a constant fraction of the allotted processors. Our analysis models the job scheduler as the thread scheduler's adversary, challenging the thread scheduler to be robust to the system environment and the job scheduler's administrative policies. We analyze the performance of A-Steal using "trim analysis," which allows us to prove that our thread scheduler performs poorly on at most a small number of time steps, while exhibiting near-optimal behavior on the vast majority. To be precise, suppose that a job has work T1 and span (critical-path length)T∞. On a machine with P processors, A-Steal completes the job in expected O(T1/P + T∞ + L lg P) time steps, where L is the length of a scheduling quantum and P denotes the O(T∞ + L lg P)-trimmed availability. This quantity is the average of the processor availability over all but the O(T∞ + L lg P)time steps having the highest processor availability. When the job's parallelism dominates the trimmed availability, that is, P « T1/T∞, the job achieves nearly perfect linear speedup. Conversely, when the trimmed mean dominates the parallelism, the asymptotic running time of the job is nearly its span.