IEEE Transactions on Parallel and Distributed Systems
Production Job Scheduling for Parallel Shared Memory Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The ANL/IBM SP Scheduling System
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization
IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance
JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration
IEEE Transactions on Parallel and Distributed Systems
Job-Length Estimation and Performance in Backfilling Schedulers
HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Characterization of Backfilling Strategies for Parallel Job Scheduling
ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Improving Parallel Job Scheduling by Combining Gang Scheduling and Backfilling Techniques
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Utilization and Predictability in Scheduling the IBM SP2 with Backfilling
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Adaptive Parallel Job Scheduling with Flexible Coscheduling
IEEE Transactions on Parallel and Distributed Systems
A new metric for robustness with application to job scheduling
HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
Prediction f Based Models for Evaluating Backfilling Scheduling Policies
PDCAT '07 Proceedings of the Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies
Job Scheduling in a Distributed System Using Backfilling with Inaccurate Runtime Computations
CISIS '10 Proceedings of the 2010 International Conference on Complex, Intelligent and Software Intensive Systems
Parallel job scheduling — a status report
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Scheduling on the top 50 machines
JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Modeling user runtime estimates
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Provisioning spot market cloud resources to create cost-effective virtual clusters
ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I
Hi-index | 0.00 |
Job schedulers improve the system utilization by requiring users to estimate how long their jobs will run and by using this information to better pack (or "backfill") the jobs. But, surprisingly, many studies find that deliberately making estimates less accurate boosts (or does not affect) the performance, which helps explain why production systems still exclusively rely on notoriously inaccurate estimates. We prove these studies wrong by showing that their methodology is erroneous. The studies model an estimate e as being correlated with r ċ F (where r is the runtime of the associated job, F is some "badness" factor, and larger F values imply increased inaccuracy). We show this model is invalid, because: (1) it conveys too much information to the scheduler; (2) it induces favoritism of short jobs; and (3) it is inherently different than real user inaccuracy, which associates 90% of the jobs with merely 20 estimate values, hindering the scheduler's ability to backfill. We conclude that researchers must stop using multiples of runtimes as estimates, or else their results would likely be invalid. We develop (and propose to use) a realistic model that preserves the estimates' modality and allows to soundly simulate increased inaccuracy by, e.g., associating more jobs with the maximal runtime allowed (an always-popular estimate, which prevents backfilling).