Highly efficient gang scheduling implementation

Authors:
Atsushi Hori;Hiroshi Tezuka;Yutaka Ishikawa
Affiliations:
Tsukuba Research Center, Real World Computing Partnership, Tsukuba Mitsui Building 16F, 1-6-1 Takezono, Tsukuba-shi, Ibaraki 305-0032, JAPAN;Tsukuba Research Center, Real World Computing Partnership, Tsukuba Mitsui Building 16F, 1-6-1 Takezono, Tsukuba-shi, Ibaraki 305-0032, JAPAN;Tsukuba Research Center, Real World Computing Partnership, Tsukuba Mitsui Building 16F, 1-6-1 Takezono, Tsukuba-shi, Ibaraki 305-0032, JAPAN
Venue:
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Year:
1998

Citing 14
Cited 15

The design of the UNIX operating system

The design of the UNIX operating system
The impact of operating system scheduling policies and synchronization methods of performance of parallel applications

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Distributed snapshots: determining global states of distributed systems

ACM Transactions on Computer Systems (TOCS)
U-Net: a user-level network interface for parallel and distributed computing

SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Performance characteristics of gang scheduling in multiprogrammed environments

SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
PM: An Operating System Coordinated High Performance Communication Library

HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Global State Detection Using Network Preemption

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Overhead Analysis of Preemptive Gang Scheduling

IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Implementation of Gang-Scheduling on Workstation Cluster

IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Gang scheduling for highly efficient, distributed multiprocessor systems

FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Detecting termination of distributed computations using markers

PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium

PM2: a high performance communication middleware for heterogeneous network environments

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
STORM: lightning-fast resource management

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Parallel C++ Programming System on Cluster of Heterogeneous Computers

HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Coscheduling in Clusters: Is It a Viable Alternative?

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Simulation of job scheduling for small scale clusters

Proceedings of the 38th conference on Winter simulation
Performance Comparison of Coscheduling Algorithms for Non-Dedicated Clusters Through a Generic Framework

International Journal of High Performance Computing Applications
A runtime resolution scheme for priority boost conflict in implicit coscheduling

The Journal of Supercomputing
A comprehensive performance and energy consumption analysis of scheduling alternatives in clusters

The Journal of Supercomputing
Adaptive time/space sharing with SCOJO

International Journal of High Performance Computing and Networking
The Impact of noise on the scaling of collectives: the nearest neighbor model

HiPC'07 Proceedings of the 14th international conference on High performance computing
A session key caching and prefetching scheme for secure communication in cluster systems

Journal of Parallel and Distributed Computing
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework

Proceedings of the 20th international symposium on High performance distributed computing
Impact of noise on scaling of collectives: an empirical evaluation

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
The impact of noise on the scaling of collectives: a theoretical approach

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
A preemption-based runtime to efficiently schedule multi-process applications on heterogeneous clusters with GPUs

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

A new and more highly efficient gang scheduling implementation technique is the basis for this paper. Network preemption,in which network interface contexts are saved and restored, has already been proposed to enable parallel applications to perform efficent user-level communication. This network preemption technique can be used to for detecting global state, such as deadlock, of a parallel program execution. A gang scheduler, SCore-D, using the network preemption technique is implemented with PM, a user-level communication library.This paper evaluates network preemption gang scheduling overhead using eight NAS parallel benchmark programs. The results of this evaluation illustrate that the saving and restoring network contexts occupies almost half of the total gang scheduling overhead. A new mechanism, having multiple network contexts and merely switching the context pointers without saving and restoring the network contexts, is proposed.The NAS parallel benchmark evaluation shows that gang scheduling overhead is almost halved. The maximum gang scheduling overhead among benchmark programs is less than 10 % , with a 40 msec time slice on 64 single-way Pentium Pros, connected by Myrinet to form a PC cluster. The numbers of secondary cache misses are counted, and it is found that network preemption with multiple network contexts is more cache-effective than a single network context. The observed scheduling overhead for applications running on 64 nodes can only be a small percent of the execution time. The gang scheduling overheads of switching two NAS parallel benchmark programs are also evaluated. The additional overheads are less than 2% in most cases, with a 100 msec time slice on 64 nodes. This slightly higher scheduling overheads than for switching a single parallel process comes from more frequent cache misses.This paper contributes the following findings; i) gang scheduling overhead with network preemption can be sufficiently low, ii) proposed network preemption with multiple network contexts is more cache-effective than a single network context, and, iii) network preemption can be applied to detect global states of user parallel processes.SCore-D gang scheduler realized by network preemption can utilize processor resources by the detecting the global state of user parallel processes. Network preemption with multiple contexts exhibits highly efficient gang scheduling. The combination of low scheduling overhead and the global state detection mechanism achieves an interactive parallel programming where parallel program development and the production run of parallel programs can be mixed freely.