Scheduling Restartable Jobs with Short Test Runs

Authors:
Ojaswirajanya Thebe;David P. Bunde;Vitus J. Leung
Affiliations:
Knox College,;Knox College,;Sandia National Laboratories,
Venue:
Job Scheduling Strategies for Parallel Processing
Year:
2009

Citing 0
Cited 3

PV-EASY: a strict fairness guaranteed and prediction enabled scheduler in parallel job scheduling

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Job failures in high performance computing systems: A large-scale empirical study

Computers & Mathematics with Applications
Improvements to the structural simulation toolkit

Proceedings of the 5th International ICST Conference on Simulation Tools and Techniques

Quantified Score

Hi-index	0.01

Visualization

Abstract

In this paper, we examine the concept of giving every job a trial run before committing it to run until completion. Trial runs allow immediate job failures to be detected shortly after job submission and benefit short jobs by letting them run and finish early. This occurs without inflicting a significant penalty on longer jobs, whose average and maximum waiting time are actually improved in some cases. The strategy does not require preemption and instead uses the ability to kill and restart a job from the beginning, which it does at most once for each job. While others have proposed similar strategies, our algorithm is distinguished by its determination to give each job a fixed-length trial run as soon as possible. Our study is also more focused, including a detailed description of the algorithm and an examination of the effect of varying the length of a trial run.