Scheduling for Parallel Supercomputing: A Historical Perspective of Achievable Utilization

  • Authors:
  • James Patton Jones;Bill Nitzberg

  • Affiliations:
  • -;-

  • Venue:
  • IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

The NAS facility has operated parallel supercomputers for the past 11 years, including the Intel iPSC/860, Intel Paragon, Thinking Machines CM-5, IBM SP-2, and Cray Origin 2000. Across this wide variety of machine architectures, across a span of 10 years, across a large number of different users, and through thousands of minor configuration and policy changes, the utilization of these machines shows three general trends: (1) scheduling using a naive FCFS first-fit policy results in 40-60% utilization, (2) switching to the more sophisticated dynamic backfilling scheduling algorithm improves utilization by about 15 percentage points (yielding about 70% utilization), and (3) reducing the maximum allowable job size further increases utilization. Most surprising is the consistency of these trends. Over the lifetime of the NAS parallel systems, we made hundreds, perhaps thousands, of small changes to hardware, software, and policy, yet utilization was affected little. In particular, these results show that the goal of achieving near 100% utilization while supporting a real parallel supercomputing workload is unrealistic.