GLUnix: a global layer Unix for a network of workstations
Software—Practice & Experience - Special issue on multiprocessor operating systems
Massively parallel computing using commodity components
Parallel Computing - Parallel computing on clusters of workstations
A system software architecture for high-end computing
SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
Portals 3.0: Protocol Building Blocks for Low Overhead Communication
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
STORM: lightning-fast resource management
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Fast Scalable File Distribution Over Infiniband
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
MPISH: A Parallel Shell for MPI Programs
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 18 - Volume 19
A case for high performance computing with virtual machines
Proceedings of the 20th annual international conference on Supercomputing
A Composition Environment for MPI Programs
International Journal of High Performance Computing Applications
Efficient access to many samall files in a filesystem for grid computing
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
TakTuk, adaptive deployment of remote executions
Proceedings of the 18th ACM international symposium on High performance distributed computing
ScELA: scalable and extensible launching architecture for clusters
HiPC'08 Proceedings of the 15th international conference on High performance computing
Adaptive connection management for scalable MPI over InfiniBand
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A multi-level scalable startup for parallel applications
Proceedings of the 1st International Workshop on Runtime and Operating Systems for Supercomputers
MPISH2: unix integration for MPI programs
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Fast and scalable startup of MPI programs in infiniband clusters
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Hi-index | 0.00 |
This paper describes the components of a runtime system for launching parallel applications and presents performance results for starting a job on more than a thousand nodes of a workstation cluster. This runtime system was developed at Sandia National Laboratories as part of the Computational Plant (Cplant™) project, which is deploying large-scale parallel computing clusters using commodity hardware and the Linux operating system. We have designed and implemented a flexible runtime system that allows for launching parallel jobs on thousands of nodes in a matter of seconds. The interactions of the components are described, and the key issues that address the scalability and performance of the runtime system are discussed. We also present performance results of launching executables of varying sizes on more than a thousand nodes.