DIB—a distributed implementation of backtracking
ACM Transactions on Programming Languages and Systems (TOPLAS)
A randomized parallel branch-and-bound procedure
STOC '88 Proceedings of the twentieth annual ACM symposium on Theory of computing
Speedup Versus Efficiency in Parallel Systems
IEEE Transactions on Computers
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Characterizations of parallelism in applications and their use in scheduling
SIGMETRICS '89 Proceedings of the 1989 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
The performance of multiprogrammed multiprocessor scheduling algorithms
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A simple load balancing scheme for task allocation in parallel machines
SPAA '91 Proceedings of the third annual ACM symposium on Parallel algorithms and architectures
The Processor Working Set and its Use in Scheduling Multiprocessor Systems
IEEE Transactions on Software Engineering
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
IEEE Transactions on Computers
Low-overhead scheduling of nested parallelism
IBM Journal of Research and Development
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Application scheduling and processor allocation in multiprogrammed parallel processing systems
Performance Evaluation - Special issue: performance modeling of parallel processing systems
Robust partitioning policies of multiprocessor systems
Performance Evaluation - Special issue: performance modeling of parallel processing systems
Journal of the ACM (JACM)
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Randomized algorithms
Provably efficient scheduling for languages with fine-grained parallelism
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
A provable time and space efficient implementation of NESL
Proceedings of the first ACM SIGPLAN international conference on Functional programming
On multiprocessor system scheduling
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Cilk: an efficient multithreaded runtime system
Journal of Parallel and Distributed Computing - Special issue on multithreading for multiprocessors
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
Using parallel program characteristics in dynamic processor allocation policies
Performance Evaluation
Exploiting process lifetime distributions for dynamic load balancing
ACM Transactions on Computer Systems (TOCS)
Space-Efficient Scheduling of Multithreaded Computations
SIAM Journal on Computing
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
The performance of work stealing in multiprogrammed environments (extended abstract)
SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Provably efficient scheduling for languages with fine-grained parallelism
Journal of the ACM (JACM)
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Preemptive scheduling of parallel jobs on multiprocessors
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Space-efficient scheduling of nested parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
Load-balancing heuristics and process behavior
SIGMETRICS '86/PERFORMANCE '86 Proceedings of the 1986 ACM SIGMETRICS joint international conference on Computer performance modelling, measurement and evaluation
The Parallel Evaluation of General Arithmetic Expressions
Journal of the ACM (JACM)
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
The data locality of work stealing
Proceedings of the twelfth annual ACM symposium on Parallel algorithms and architectures
Non-blocking steal-half work queues
Proceedings of the twenty-first annual symposium on Principles of distributed computing
Introduction to Algorithms
A parallel workload model and its implications for processor allocation
Cluster Computing
Maximizing Speedup through Self-Tuning of Processor Allocation
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
A Model for Moldable Supercomputer Jobs
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Multiprocessor Scheduling for High-Variability Service Time Distributions
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
On the Benefits and Limitations of Dynamic Partitioning in Parallel Computer Systems
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Analysis of Non-Work-Conserving Processor Partitioning Policies
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Packing Schemes for Gang Scheduling
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Using Runtime Measured Workload Characteristics in Parallel Processor Scheduling
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Dynamic vs. Static Quantum-Based Parallel Processor Allocation
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Implementation of multilisp: Lisp on a multiprocessor
LFP '84 Proceedings of the 1984 ACM Symposium on LISP and functional programming
Executing functional programs on a virtual tree of processors
FPCA '81 Proceedings of the 1981 conference on Functional programming languages and computer architecture
The workload on parallel supercomputers: modeling the characteristics of rigid jobs
Journal of Parallel and Distributed Computing
Non-clair voy ant multiprocessor scheduling of jobs with changing execution characteristics
Journal of Scheduling - Special issue: On-line scheduling
Dynamic circular work-stealing deque
Proceedings of the seventeenth annual ACM symposium on Parallelism in algorithms and architectures
Adaptive scheduling with parallelism feedback
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
An Empirical Evaluation ofWork Stealing with Parallelism Feedback
ICDCS '06 Proceedings of the 26th IEEE International Conference on Distributed Computing Systems
A dynamic-sized nonblocking work stealing deque
Distributed Computing - Special issue: DISC 04
Adaptive work stealing with parallelism feedback
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
On the costs and benefits of stochasticity in stream processing
Proceedings of the 47th Design Automation Conference
Hardware/software support for adaptive work-stealing in on-chip multiprocessor
Journal of Systems Architecture: the EUROMICRO Journal
Vertical stealing: robust, locality-aware do-all workload distribution for 3D MPSoCs
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Space-efficient scheduling of stochastically generated tasks
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming: Part II
Space-efficient scheduling of stochastically generated tasks
Information and Computation
BWS: balanced work stealing for time-sharing multicores
Proceedings of the 7th ACM european conference on Computer Systems
Processor allocation for optimistic parallelization of irregular programs
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part I
Palirria: Accurate On-line Parallelism Estimation for Adaptive Work-Stealing
Proceedings of Programming Models and Applications on Multicores and Manycores
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures
Proceedings of Programming Models and Applications on Multicores and Manycores
Hi-index | 0.00 |
Multiprocessor scheduling in a shared multiprogramming environment can be structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level thread scheduler schedules the work of a job on its allotted processors. We present a randomized work-stealing thread scheduler for fork-join multithreaded jobs that provides continual parallelism feedback to the job scheduler in the form of requests for processors. Our A-STEAL algorithm is appropriate for large parallel servers where many jobs share a common multiprocessor resource and in which the number of processors available to a particular job may vary during the job's execution. Assuming that the job scheduler never allots a job more processors than requested by the job's thread scheduler, A-STEAL guarantees that the job completes in near-optimal time while utilizing at least a constant fraction of the allotted processors. We model the job scheduler as the thread scheduler's adversary, challenging the thread scheduler to be robust to the operating environment as well as to the job scheduler's administrative policies. For example, the job scheduler might make a large number of processors available exactly when the job has little use for them. To analyze the performance of our adaptive thread scheduler under this stringent adversarial assumption, we introduce a new technique called trim analysis, which allows us to prove that our thread scheduler performs poorly on no more than a small number of time steps, exhibiting near-optimal behavior on the vast majority. More precisely, suppose that a job has work T1 and span T∞. On a machine with P processors, A-STEAL completes the job in an expected duration of O(T1/&Ptilde; + T∞ + L lg P) time steps, where L is the length of a scheduling quantum, and &Ptilde; denotes the O(T∞ + L lg P)-trimmed availability. This quantity is the average of the processor availability over all time steps except the O(T∞ + L lg P) time steps that have the highest processor availability. When the job's parallelism dominates the trimmed availability, that is, &Ptilde; ≪ T1/T∞, the job achieves nearly perfect linear speedup. Conversely, when the trimmed mean dominates the parallelism, the asymptotic running time of the job is nearly the length of its span, which is optimal. We measured the performance of A-STEAL on a simulated multiprocessor system using synthetic workloads. For jobs with sufficient parallelism, our experiments confirm that A-STEAL provides almost perfect linear speedup across a variety of processor availability profiles. We compared A-STEAL with the ABP algorithm, an adaptive work-stealing thread scheduler developed by Arora et al. [1998] which does not employ parallelism feedback. On moderately to heavily loaded machines with large numbers of processors, A-STEAL typically completed jobs more than twice as quickly as ABP, despite being allotted the same number or fewer processors on every step, while wasting only 10% of the processor cycles wasted by ABP.