PV-EASY: a strict fairness guaranteed and prediction enabled scheduler in parallel job scheduling

  • Authors:
  • Yulai Yuan;Guangwen Yang;Yongwei Wu;Weimin Zheng

  • Affiliations:
  • Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China;Tsinghua University, Beijing, China

  • Venue:
  • Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

As the most widely used parallel job scheduling strategy in production schedulers, EASY has achieved great success, not only because it can balance fairness and performance, but also because it is universally applicable to most HPC systems. However, unfairness still exists in EASY. For real workloads used in this work, our simulation shows that a blocked job can be delayed by later jobs for more than 90 hours. In addition, EASY cannot directly employ parallel job runtime prediction techniques, because this would lead to a serious situation called reservation violation. In this paper, we aim at guaranteeing strict fairness (no job is delayed by any jobs of lower priority) while achieving attractive performance, and employing prediction without causing reservation violation in parallel job scheduling. We propose two novel strategies, shadow load preemption (SLP) and venture backfilling (VB), which are together integrated into EASY to construct a preemptive venture EASY backfilling (PV-EASY) strategy. Experimental results on three workloads of real HPC systems demonstrate that: First, PV-EASY guarantees strict fairness, in addition to avoiding reservation violation when employing job runtime prediction techniques in scheduling; Second, PV-EASY achieves the same performance as EASY, and outperforms prediction employed EASY; Third, the preemption in PV-EASY is not resource costly and simple enough to be implemented in all HPC systems where EASY works. These advantages make PV-EASY more attractive than EASY in parallel job scheduling, both from academic and industry perspectives.