Time-critical scheduling on a well utilised HPC system at ECMWF using loadleveler with resource reservation

  • Authors:
  • Graham Holt

  • Affiliations:
  • ECMWF, Reading, Berkshire, UK

  • Venue:
  • JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This article is written in the context of running a suite of time-critical operational numerical weather prediction batch jobs, along with a substantial number of research batch jobs on a large IBM Cluster 1600 system. The batch subsystem used is IBM's LoadLeveler incorporating a little known feature called Resource Reservation. The article describes how the mixture of operational and research parallel batch jobs are scheduled to run on the 117 nodes provided, and how Resource Reservation for operational jobs is performed without reference to job class. Where research parallel batch jobs are jobs requesting more than 1 CPU and must run consistently to ensure resources are released predictably. Note – information is given explaining how consistent runtimes are achieved.