Impact of job mix on optimizations for space sharing schedulers

Authors:
Jaspal Subhlok;Thomas Gross;Takashi Suzuoka
Affiliations:
School of Computer Science, Carnegie Mellon University, Pittsburgh, PA;School of Computer Science, Carnegie Mellon University, Pittsburgh, PA and Institut fuer Computersysteme, ETH Zuerich, 8092 Zuerich, Switzerland;Toshiba Corporation, 1, Komukai, Toshiba-Cho, Saiwai-Ku, Kawasaki 210, Japan
Venue:
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Year:
1996

Citing 6
Cited 11

Analysis of the early workload on the Cornell Theory Center IBM SP2

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Non-contiguous processor allocation algorithms for distributed memory multicomputers

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
Job Scheduling in a Partitionable Mesh Using a Two-Dimensional Buddy System Partitioning Scheme

IEEE Transactions on Parallel and Distributed Systems
Allocating Precise Submeshes in Mesh Connected Systems

IEEE Transactions on Parallel and Distributed Systems
Job Characteristics of a Production Parallel Scientivic Workload on the NASA Ames iPSC/860

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
The EASY - LoadLeveler API Project

IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing

Randomization, speculation, and adaptation in batch schedulers

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Using moldability to improve the performance of supercomputer jobs

Journal of Parallel and Distributed Computing
When the Herd Is Smart: Aggregate Behavior in the Selection of Job Request

IEEE Transactions on Parallel and Distributed Systems
Simulation Based HPC Workload Analysis

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A Model for Moldable Supercomputer Jobs

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The Influence of the Structure and Sizes of Jobs on the Performance of Co-allocation

IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Effect of Job Size Characteristics on Job Scheduling Performance

IPDPS '00/JSSPP '00 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Characteristics of a Large Shared Memory Production Workload

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Evaluation of Task Assignment Policies for Supercomputing Servers: The Case for Load Unbalancing and Fairness

Cluster Computing
Time-sharing parallel applications through performance-targeted feedback-controlled real-time scheduling

Cluster Computing
Parallel computer workload modeling with markov chains

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing

Quantified Score

Hi-index	0.01

Visualization

Abstract

This paper addresses job scheduling for parallel supercomputers. Modern parallel systems with n nodes can be used by jobs requesting up to n nodes. If less than n nodes are requested, multiple jobs can be run at the same time, allowing several users to use the system. One of the challenges for the operating system is to give reasonable service to a diverse group of requests. A single 1-node job that is running for a long time may effectively block the whole machine if the next job requests all n nodes. To date various policies have been proposed for the scheduling highly parallel computers. But as the users of current supercomputers know, these policies work far from perfect. This paper reports on the measurement of the usage of a 96-node Intel Paragon at ETH Zurich, a 512 node IBM SP2 at Cornell Theory Center, and a 512 node Cray T3D at Pittsburgh Supercomputing Center. We discuss the common characteristics of the different workloads and identify their impact on job scheduling techniques for such parallel systems. The metrics used for evaluating scheduling are based on turnaround time and fairness among jobs. We specifically show how two simple scheduling optimizations based on reordering the waiting queue can be used to effectively improve scheduling performance on real workloads. An important contribution of this paper is to establish that supercomputer workloads do exhibit some common characteristics but they also differ in important ways, and the knowledge of workloads is important for design of effective scheduling algorithms. Given the current ad-hoc approach applied to tuning the scheduler systems, these results are of interest to scheduling researchers, supercomputer installations, and developers of scheduling software.