Scheduling in multiprogrammed parallel systems
SIGMETRICS '88 Proceedings of the 1988 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Speedup Versus Efficiency in Parallel Systems
IEEE Transactions on Computers
Mul-T: a high-performance parallel Lisp
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Process control and scheduling issues for multiprogrammed shared-memory multiprocessors
SOSP '89 Proceedings of the twelfth ACM symposium on Operating systems principles
Lazy task creation: a technique for increasing the granularity of parallel programs
LFP '90 Proceedings of the 1990 ACM conference on LISP and functional programming
The performance of multiprogrammed multiprocessor scheduling algorithms
SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Dynamic Processor Self-Scheduling for General Parallel Nested Loops
IEEE Transactions on Computers
Low-overhead scheduling of nested parallelism
IBM Journal of Research and Development
A dynamic processor allocation policy for multiprogrammed shared-memory multiprocessors
ACM Transactions on Computer Systems (TOCS)
Performance analysis of job scheduling policies in parallel supercomputing environments
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
A survey of PRAM simulation techniques
ACM Computing Surveys (CSUR)
Robust partitioning policies of multiprocessor systems
Performance Evaluation - Special issue: performance modeling of parallel processing systems
Implementation of a portable nested data-parallel language
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Efficient compilation of high-level data parallel algorithms
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Scheduling parallelizable tasks to minimize average response time
SPAA '94 Proceedings of the sixth annual ACM symposium on Parallel algorithms and architectures
Data parallel programming in an adaptive environment
Data parallel programming in an adaptive environment
Fast and efficient simulations among CRCW PRAMs
Journal of Parallel and Distributed Computing
Cilk: an efficient multithreaded runtime system
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Provably efficient scheduling for languages with fine-grained parallelism
Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Compiler and runtime support for programming in adaptive parallel environments
Compiler and runtime support for programming in adaptive parallel environments
A provable time and space efficient implementation of NESL
Proceedings of the first ACM SIGPLAN international conference on Functional programming
On multiprocessor system scheduling
Proceedings of the eighth annual ACM symposium on Parallel algorithms and architectures
Executing multithreaded programs efficiently
Executing multithreaded programs efficiently
Using parallel program characteristics in dynamic processor allocation policies
Performance Evaluation
Space-Efficient Scheduling of Multithreaded Computations
SIAM Journal on Computing
Thread scheduling for multiprogrammed multiprocessors
Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
The implementation of the Cilk-5 multithreaded language
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
STOC '99 Proceedings of the thirty-first annual ACM symposium on Theory of computing
Provably efficient scheduling for languages with fine-grained parallelism
Journal of the ACM (JACM)
SODA '93 Proceedings of the fourth annual ACM-SIAM Symposium on Discrete algorithms
Preemptive scheduling of parallel jobs on multiprocessors
Proceedings of the seventh annual ACM-SIAM symposium on Discrete algorithms
Space-efficient scheduling of nested parallelism
ACM Transactions on Programming Languages and Systems (TOPLAS)
The Parallel Evaluation of General Arithmetic Expressions
Journal of the ACM (JACM)
Scheduling multithreaded computations by work stealing
Journal of the ACM (JACM)
ZPL: A Machine Independent Programming Language for Parallel Computers
IEEE Transactions on Software Engineering - Special issue on architecture-independent languages and software tools for parallel processing
ACM Transactions on Programming Languages and Systems (TOPLAS)
Scheduling independent tasks to reduce mean finishing time
Communications of the ACM
On the Benefits and Limitations of Dynamic Partitioning in Parallel Computer Systems
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Analysis of Non-Work-Conserving Processor Partitioning Policies
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Non-clair voy ant multiprocessor scheduling of jobs with changing execution characteristics
Journal of Scheduling - Special issue: On-line scheduling
Adaptive and reliable parallel computing on networks of workstations
ATEC '97 Proceedings of the annual conference on USENIX Annual Technical Conference
Adaptive work stealing with parallelism feedback
Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Using application information to drive adaptive grid middleware scheduling decisions
Proceedings of the 2nd workshop on Middleware-application interaction: affiliated with the DisCoTec federated conferences 2008
Adaptive work-stealing with parallelism feedback
ACM Transactions on Computer Systems (TOCS)
Languages and Compilers for Parallel Computing
The design of a task parallel library
Proceedings of the 24th ACM SIGPLAN conference on Object oriented programming systems languages and applications
Improved results for scheduling batched parallel jobs by using a generalized analysis framework
Journal of Parallel and Distributed Computing
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Provably efficient two-level adaptive scheduling
JSSPP'06 Proceedings of the 12th international conference on Job scheduling strategies for parallel processing
Run-time automatic performance tuning for multicore applications
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
BWS: balanced work stealing for time-sharing multicores
Proceedings of the 7th ACM european conference on Computer Systems
Palirria: Accurate On-line Parallelism Estimation for Adaptive Work-Stealing
Proceedings of Programming Models and Applications on Multicores and Manycores
DWS: Demand-aware Work-Stealing in Multi-programmed Multi-core Architectures
Proceedings of Programming Models and Applications on Multicores and Manycores
Competitive online adaptive scheduling for sets of parallel jobs with fairness and efficiency
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
Multiprocessor scheduling in a shared multiprogramming environment is often structured as two-level scheduling, where a kernel-level job scheduler allots processors to jobs and a user-level task scheduler schedules the work of a job on the allotted processors. In this context, the number of processors allotted to a particular job may vary during the job's execution, and the task scheduler must adapt to these changes in processor resources. For overall system efficiency, the task scheduler should also provide parallelism feedback to the job scheduler to avoid the situation where a job is allotted processors that it cannot use productively.We present an adaptive task scheduler for multitasked jobs with dependencies that provides continual parallelism feedback to the job scheduler in the form of requests for processors. Our scheduler guarantees that a job completes near optimally while utilizing at least a constant fraction of the allotted processor cycles. Our scheduler can be applied to schedule data-parallel programs, such as those written in High Performance Fortran (HPF), *Lisp, C*, NESL, and ZPL.Our analysis models the job scheduler as the task scheduler's adversary, challenging the task scheduler to be robust to the system environment and the job scheduler's administrative policies. For example, the job scheduler can make available a huge number of processors exactly when the job has little use for them. To analyze the performance of our adaptive task scheduler under this stringent adversarial assumption, we introduce a new technique called "trim analysis," which allows us to prove that our task scheduler performs poorly on at most a small number of time steps, exhibiting near-optimal behavior on the vast majority.To be precise, suppose that a job has work T1 and critical-path length T∞ and is running on a machine with P processors. Using trim analysis, we prove that our scheduler completes the job in O(T1/P + T∞ + Llg P) time steps, where L is the length of a scheduling quantum and P denotes the O(T∞ + L lg P)-trimmed availability. This quantity is the average of the processor availability over all time steps excluding the O(T∞ + L lg P) time steps with the highest processor availability. When T1/T∞ P (the job's parallelism dominates the O(T∞ + L lg P)-trimmed availability), the job achieves nearly perfect linear speedup. Conversely, when T1/T∞ P, the asymptotic running time of the job is nearly the length of its critical path.