Using inaccurate estimates accurately

Authors:
Dan Tsafrir
Affiliations:
Department of Computer Science, Technion-Israel Institute of Technology, Haifa, Israel
Venue:
JSSPP'10 Proceedings of the 15th international conference on Job scheduling strategies for parallel processing
Year:
2010

Citing 17
Cited 1

Utilization, Predictability, Workloads, and User Runtime Estimates in Scheduling the IBM SP2 with Backfilling

IEEE Transactions on Parallel and Distributed Systems
Production Job Scheduling for Parallel Shared Memory Systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The ANL/IBM SP Scheduling System

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization

IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
The Impact of More Accurate Requested Runtimes on Production Job Scheduling Performance

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration

IEEE Transactions on Parallel and Distributed Systems
Job-Length Estimation and Performance in Backfilling Schedulers

HPDC '99 Proceedings of the 8th IEEE International Symposium on High Performance Distributed Computing
Characterization of Backfilling Strategies for Parallel Job Scheduling

ICPPW '02 Proceedings of the 2002 International Conference on Parallel Processing Workshops
Improving Parallel Job Scheduling by Combining Gang Scheduling and Backfilling Techniques

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Utilization and Predictability in Scheduling the IBM SP2 with Backfilling

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Adaptive Parallel Job Scheduling with Flexible Coscheduling

IEEE Transactions on Parallel and Distributed Systems
A new metric for robustness with application to job scheduling

HPDC '05 Proceedings of the High Performance Distributed Computing, 2005. HPDC-14. Proceedings. 14th IEEE International Symposium
Prediction f Based Models for Evaluating Backfilling Scheduling Policies

PDCAT '07 Proceedings of the Eighth International Conference on Parallel and Distributed Computing, Applications and Technologies
Job Scheduling in a Distributed System Using Backfilling with Inaccurate Runtime Computations

CISIS '10 Proceedings of the 2010 International Conference on Complex, Intelligent and Software Intensive Systems
Parallel job scheduling — a status report

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Scheduling on the top 50 machines

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Modeling user runtime estimates

JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing

Provisioning spot market cloud resources to create cost-effective virtual clusters

ICA3PP'11 Proceedings of the 11th international conference on Algorithms and architectures for parallel processing - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

Job schedulers improve the system utilization by requiring users to estimate how long their jobs will run and by using this information to better pack (or "backfill") the jobs. But, surprisingly, many studies find that deliberately making estimates less accurate boosts (or does not affect) the performance, which helps explain why production systems still exclusively rely on notoriously inaccurate estimates. We prove these studies wrong by showing that their methodology is erroneous. The studies model an estimate e as being correlated with r ċ F (where r is the runtime of the associated job, F is some "badness" factor, and larger F values imply increased inaccuracy). We show this model is invalid, because: (1) it conveys too much information to the scheduler; (2) it induces favoritism of short jobs; and (3) it is inherently different than real user inaccuracy, which associates 90% of the jobs with merely 20 estimate values, hindering the scheduler's ability to backfill. We conclude that researchers must stop using multiples of runtimes as estimates, or else their results would likely be invalid. We develop (and propose to use) a realistic model that preserves the estimates' modality and allows to soundly simulate increased inaccuracy by, e.g., associating more jobs with the maximal runtime allowed (an always-popular estimate, which prevents backfilling).