An infrastructure for efficient parallel job execution in Terascale computing environments

Authors:
J. E. Moreira;W. Chan;L. L. Fong;H. Franke;M. A. Jette
Affiliations:
International Business Machines Corporation, Armonk, NY;International Business Machines Corporation, Armonk, NY;International Business Machines Corporation, Armonk, NY;International Business Machines Corporation, Armonk, NY;Lawrence Livermore National Laboratory, Livermore, CA
Venue:
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Year:
1998

Citing 13
Cited 15

Distributed Hierarchical Control for Parallel Processing

Computer
Evaluation of design choices for gang scheduling using distributed hierarchical control

Journal of Parallel and Distributed Computing
The ANL/IBM SP Scheduling System

IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
A Gang Scheduling Design for Multiprogrammed Parallel Computing Environments

IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
The EASY - LoadLeveler API Project

IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Performance Evaluation of Gang Scheduling for Parallel and Distributed Multiprogramming

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Implications of I/O for Gang Scheduled Workloads

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Improved Utilization and Responsiveness with Gang Scheduling

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Expanding Symmetric Multiprocessor Capability Through Gang Scheduling

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Implementing the Combination of Time Sharing and Space Sharing on AP/Linux

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Improving First-Come-First-Serve Job Scheduling by Gang Scheduling

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Gang scheduling for highly efficient, distributed multiprocessor systems

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Extensible Resource Management For Cluster Computing

ICDCS '97 Proceedings of the 17th International Conference on Distributed Computing Systems (ICDCS '97)

An evaluation of parallel job scheduling for ASCI Blue-Pacific

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Process Tracking for Parallel Job Control

IPPS/SPDP '99/JSSPP '99 Proceedings of the Job Scheduling Strategies for Parallel Processing
An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
Job Scheduling for the BlueGene/L System

JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
The Impact of Migration on Parallel Job Scheduling for Distributed Systems

Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
An Integrated Approach to Parallel Scheduling Using Gang-Scheduling, Backfilling, and Migration

IEEE Transactions on Parallel and Distributed Systems
LOMARC: Lookahead Matchmaking for Multiresource Coscheduling on Hyperthreaded CPUs

IEEE Transactions on Parallel and Distributed Systems
Adaptive time/space sharing with SCOJO

International Journal of High Performance Computing and Networking
Dynamic load balancing in MPI jobs

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
Service control with the preemptive parallel job scheduler Scojo-PECT

Cluster Computing
Server-side I/O coordination for parallel file systems

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Parallel job scheduling — a status report

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
LOMARC — lookahead matchmaking for multi-resource coscheduling

JSSPP'04 Proceedings of the 10th international conference on Job Scheduling Strategies for Parallel Processing
Pitfalls in parallel job scheduling evaluation

JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Coarse-grain time slicing with resource-share control in parallel-job scheduling

HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications

Quantified Score

Hi-index	0.00

Visualization

Abstract

Recent Terascale computing environments, such as those in the Department of Energy Accelerated Strategic Computing Initiative, present a new challenge to job scheduling and execution systems. The traditional way to concurrently execute multiple jobs in such large machines is through space-sharing: each job is given dedicated use of a pool of processors.Previous work in this area has demonstrated the benefits of sharing the parallel machine's resources not only spatially but also temporally. Time-sharing creates virtual processors for the execution of jobs. The scheduling is typically performed cyclically and each time-slice of the cycle can be considered an independent virtual machine. When all tasks of a parallel job are scheduled to run on the same time-slice (same virtual machine), gang-scheduling is accomplished. Research has shown that gang-scheduling can greatly improve system utilization and job response time in large parallel systems.We are developing GangLL, a research prototype system for performing gang-scheduling on the ASCI Blue-Pacific machine, an IBM RS/6000 SP to be installed at Lawrence Livermore National Laboratory. This machine consists of several hundred nodes, interconnected by a high-speed communication switch. GangLL is organized as a centralized scheduler that performs global decision-making, and a local daemon in each node that controls job execution according to those decisions.The centralized scheduler builds an Ousterhout matrix that precisely defines the temporal and spatial allocation of tasks in the system. Once the matrix is built, it is distributed to each of the local daemons using a scalable hierarchical distributions scheme. A two-phase commit is used in the distribution scheme to guarantee that all local daemons have consistent information. The local daemons enforce the schedule dedicated by the Ousterhout matrix in their corresponding nodes. This requires suspending and resuming execution of tasks and multiplexing access to the communication switch.Large supercomputing centers tend to have their own job scheduling systems, to handle site specific conditions. Therefore, we are designing GangLL so that it can interact with an external site scheduler. The goal is to let the site scheduler control spatial allocation of jobs, if so desired, and to decide when jobs run. GangLL then performs the detailed temporal allocation and controls the actual execution of jobs. The site scheduler can control the fraction of a shared processor that a job receives through an execution factor parameter.To quantify the benefits of our gang-scheduling system to job execution in a large parallel system, we simulate the system with a realistic workload. We measure performance parameters under various degrees of time-sharing, characterized by the multiprogramming level. Our results show that higher multiprogramming levels lead to higher system utilization and lower job response times. We also report some results from the initial deployment of GangLL on a small multiprocessor system.