Impact of job mix on optimizations for space sharing schedulers

  • Authors:
  • Jaspal Subhlok;Thomas Gross;Takashi Suzuoka

  • Affiliations:
  • School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA and Institut fuer Computersysteme, ETH Zuerich, 8092 Zuerich, Switzerland;Toshiba Corporation, 1, Komukai, Toshiba-Cho, Saiwai-Ku, Kawasaki 210, Japan

  • Venue:
  • Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
  • Year:
  • 1996

Quantified Score

Hi-index 0.01

Visualization

Abstract

This paper addresses job scheduling for parallel supercomputers. Modern parallel systems with n nodes can be used by jobs requesting up to n nodes. If less than n nodes are requested, multiple jobs can be run at the same time, allowing several users to use the system. One of the challenges for the operating system is to give reasonable service to a diverse group of requests. A single 1-node job that is running for a long time may effectively block the whole machine if the next job requests all n nodes. To date various policies have been proposed for the scheduling highly parallel computers. But as the users of current supercomputers know, these policies work far from perfect. This paper reports on the measurement of the usage of a 96-node Intel Paragon at ETH Zurich, a 512 node IBM SP2 at Cornell Theory Center, and a 512 node Cray T3D at Pittsburgh Supercomputing Center. We discuss the common characteristics of the different workloads and identify their impact on job scheduling techniques for such parallel systems. The metrics used for evaluating scheduling are based on turnaround time and fairness among jobs. We specifically show how two simple scheduling optimizations based on reordering the waiting queue can be used to effectively improve scheduling performance on real workloads. An important contribution of this paper is to establish that supercomputer workloads do exhibit some common characteristics but they also differ in important ways, and the knowledge of workloads is important for design of effective scheduling algorithms. Given the current ad-hoc approach applied to tuning the scheduler systems, these results are of interest to scheduling researchers, supercomputer installations, and developers of scheduling software.