The design of the UNIX operating system
The design of the UNIX operating system
SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
High performance messaging on workstations: Illinois fast messages (FM) for Myrinet
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Performance characteristics of gang scheduling in multiprogrammed environments
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
PM: An Operating System Coordinated High Performance Communication Library
HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Global State Detection Using Network Preemption
IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Overhead Analysis of Preemptive Gang Scheduling
IPPS/SPDP '98 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Implementation of Gang-Scheduling on Workstation Cluster
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Gang scheduling for highly efficient, distributed multiprocessor systems
FRONTIERS '96 Proceedings of the 6th Symposium on the Frontiers of Massively Parallel Computation
Detecting termination of distributed computations using markers
PODC '83 Proceedings of the second annual ACM symposium on Principles of distributed computing
Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
PM2: a high performance communication middleware for heterogeneous network environments
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
STORM: lightning-fast resource management
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Parallel C++ Programming System on Cluster of Heterogeneous Computers
HCW '99 Proceedings of the Eighth Heterogeneous Computing Workshop
Coscheduling in Clusters: Is It a Viable Alternative?
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Simulation of job scheduling for small scale clusters
Proceedings of the 38th conference on Winter simulation
International Journal of High Performance Computing Applications
A runtime resolution scheme for priority boost conflict in implicit coscheduling
The Journal of Supercomputing
A comprehensive performance and energy consumption analysis of scheduling alternatives in clusters
The Journal of Supercomputing
Adaptive time/space sharing with SCOJO
International Journal of High Performance Computing and Networking
The Impact of noise on the scaling of collectives: the nearest neighbor model
HiPC'07 Proceedings of the 14th international conference on High performance computing
A session key caching and prefetching scheme for secure communication in cluster systems
Journal of Parallel and Distributed Computing
Supporting GPU sharing in cloud environments with a transparent runtime consolidation framework
Proceedings of the 20th international symposium on High performance distributed computing
Impact of noise on scaling of collectives: an empirical evaluation
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
The impact of noise on the scaling of collectives: a theoretical approach
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Hi-index | 0.00 |
A new and more highly efficient gang scheduling implementation technique is the basis for this paper. Network preemption,in which network interface contexts are saved and restored, has already been proposed to enable parallel applications to perform efficent user-level communication. This network preemption technique can be used to for detecting global state, such as deadlock, of a parallel program execution. A gang scheduler, SCore-D, using the network preemption technique is implemented with PM, a user-level communication library.This paper evaluates network preemption gang scheduling overhead using eight NAS parallel benchmark programs. The results of this evaluation illustrate that the saving and restoring network contexts occupies almost half of the total gang scheduling overhead. A new mechanism, having multiple network contexts and merely switching the context pointers without saving and restoring the network contexts, is proposed.The NAS parallel benchmark evaluation shows that gang scheduling overhead is almost halved. The maximum gang scheduling overhead among benchmark programs is less than 10 % , with a 40 msec time slice on 64 single-way Pentium Pros, connected by Myrinet to form a PC cluster. The numbers of secondary cache misses are counted, and it is found that network preemption with multiple network contexts is more cache-effective than a single network context. The observed scheduling overhead for applications running on 64 nodes can only be a small percent of the execution time. The gang scheduling overheads of switching two NAS parallel benchmark programs are also evaluated. The additional overheads are less than 2% in most cases, with a 100 msec time slice on 64 nodes. This slightly higher scheduling overheads than for switching a single parallel process comes from more frequent cache misses.This paper contributes the following findings; i) gang scheduling overhead with network preemption can be sufficiently low, ii) proposed network preemption with multiple network contexts is more cache-effective than a single network context, and, iii) network preemption can be applied to detect global states of user parallel processes.SCore-D gang scheduler realized by network preemption can utilize processor resources by the detecting the global state of user parallel processes. Network preemption with multiple contexts exhibits highly efficient gang scheduling. The combination of low scheduling overhead and the global state detection mechanism achieves an interactive parallel programming where parallel program development and the production run of parallel programs can be mixed freely.