Adaptive scheduling with parallelism feedback

Authors:
Kunal Agrawal;Yuxiong He;Wen Jing Hsu;Charles E. Leiserson
Affiliations:
Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA;Massachusetts Institute of Technology, Cambridge, MA
Venue:
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
2006

Citing 42
Cited 13

Scheduling in multiprogrammed parallel systems

SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Speedup Versus Efficiency in Parallel Systems

IEEE Transactions on Computers
Mul-T: a high-performance parallel Lisp

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors

SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Lazy task creation: a technique for increasing the granularity of parallel programs

LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
The performance of multiprogrammed multiprocessor scheduling algorithms

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Dynamic Processor Self-Scheduling for General Parallel Nested Loops

IEEE Transactions on Computers
Low-overhead scheduling of nested parallelism

IBM Journal of Research and Development
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors

ACM Transactions on Computer Systems (TOCS)
Performance analysis of job scheduling policies in parallel supercomputing environments

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A survey of PRAM simulation techniques

ACM Computing Surveys (CSUR)
Robust partitioning policies of multiprocessor systems

Performance Evaluation - Special issue: performance modeling of parallel processing systems
Implementation of a portable nested data-parallel language

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Efficient compilation of high-level data parallel algorithms

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Scheduling parallelizable tasks to minimize average response time

SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Data parallel programming in an adaptive environment

Data parallel programming in an adaptive environment
Fast and efficient simulations among CRCW PRAMs

Journal of Parallel and Distributed Computing
Cilk: an efficient multithreaded runtime system

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Provably efficient scheduling for languages with fine-grained parallelism

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Compiler and runtime support for programming in adaptive parallel environments

Compiler and runtime support for programming in adaptive parallel environments
A provable time and space efficient implementation of NESL

Proceedings of the first ACM SIGPLAN international conference on Functional programming
On multiprocessor system scheduling

Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Executing multithreaded programs efficiently

Executing multithreaded programs efficiently
Using parallel program characteristics in dynamic processor allocation policies

Performance Evaluation
Space-Efficient Scheduling of Multithreaded Computations

SIAM Journal on Computing
Thread scheduling for multiprogrammed multiprocessors

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Scheduling in the dark

STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Provably efficient scheduling for languages with fine-grained parallelism

Journal of the ACM (JACM)
Non-clairvoyant scheduling

SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Preemptive scheduling of parallel jobs on multiprocessors

Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Space-efficient scheduling of nested parallelism

ACM Transactions on Programming Languages and Systems (TOPLAS)
The Parallel Evaluation of General Arithmetic Expressions

Journal of the ACM (JACM)
Scheduling multithreaded computations by work stealing

Journal of the ACM (JACM)
ZPL: A Machine Independent Programming Language for Parallel Computers

IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
Basic Techniques for the Efficient Coordination of Very Large Numbers of Cooperating Sequential Processors

ACM Transactions on Programming Languages and Systems (TOPLAS)
Scheduling independent tasks to reduce mean finishing time

Communications of the ACM
On the Benefits and Limitations of Dynamic Partitioning in Parallel Computer Systems

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Analysis of Non-Work-Conserving Processor Partitioning Policies

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Non-clair voy ant multiprocessor scheduling of jobs with changing execution characteristics

Journal of Scheduling - Special issue: On-line scheduling
Non-Clairvoyant Scheduling for Minimizing Mean Slowdown

Algorithmica
Adaptive and reliable parallel computing on networks of workstations

ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference

Adaptive work stealing with parallelism feedback

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Using application information to drive adaptive grid middleware scheduling decisions

Proceedings of the 2nd workshop on Middleware-application interaction: affiliated with the DisCoTec federated conferences 2008
Adaptive work-stealing with parallelism feedback

ACM Transactions on Computer Systems (TOCS)
Critical Block Scheduling: A Thread-Level Parallelizing Mechanism for a Heterogeneous Chip Multiprocessor Architecture

Languages and Compilers for Parallel Computing
The design of a task parallel library

Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Improved results for scheduling batched parallel jobs by using a generalized analysis framework

Journal of Parallel and Distributed Computing
Load balancing on speed

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Provably efficient two-level adaptive scheduling

JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Run-time automatic performance tuning for multicore applications

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
BWS: balanced work stealing for time-sharing multicores

Proceedings of the 7th ACM european conference on Computer Systems
Palirria: Accurate On-line Parallelism Estimation for Adaptive Work-Stealing

Proceedings of Programming Models and Applications on Multicores and Manycores
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures

Proceedings of Programming Models and Applications on Multicores and Manycores
Competitive online adaptive scheduling for sets of parallel jobs with fairness and efficiency

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted processors. In this context, the number of processors allotted to a particular job may vary during the job's execution, and the task scheduler must adapt to these changes in processor resources. For overall system efficiency, the task scheduler should also provide parallelism feedback to the job scheduler to avoid the situation where a job is allotted processors that it cannot use productively.We present an adaptive task scheduler for multitasked jobs with dependencies that provides continual parallelism feedback to the job scheduler in the form of requests for processors. Our scheduler guarantees that a job completes near optimally while utilizing at least a constant fraction of the allotted processor cycles. Our scheduler can be applied to schedule data-parallel programs, such as those written in High Performance Fortran (HPF), *Lisp, C*, NESL, and ZPL.Our analysis models the job scheduler as the task scheduler's adversary, challenging the task scheduler to be robust to the system environment and the job scheduler's administrative policies. For example, the job scheduler can make available a huge number of processors exactly when the job has little use for them. To analyze the performance of our adaptive task scheduler under this stringent adversarial assumption, we introduce a new technique called "trim analysis," which allows us to prove that our task scheduler performs poorly on at most a small number of time steps, exhibiting near-optimal behavior on the vast majority.To be precise, suppose that a job has work T1 and critical-path length T∞ and is running on a machine with P processors. Using trim analysis, we prove that our scheduler completes the job in O(T1/P + T∞ + Llg P) time steps, where L is the length of a scheduling quantum and P denotes the O(T∞ + L lg P)-trimmed availability. This quantity is the average of the processor availability over all time steps excluding the O(T∞ + L lg P) time steps with the highest processor availability. When T1/T∞ P (the job's parallelism dominates the O(T∞ + L lg P)-trimmed availability), the job achieves nearly perfect linear speedup. Conversely, when T1/T∞ P, the asymptotic running time of the job is nearly the length of its critical path.