Processor Scheduling and Allocation for 3D Torus Multicomputer Systems
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
The ANL/IBM SP Scheduling System
IPPS '95 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Improved Utilization and Responsiveness with Gang Scheduling
IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Job Scheduling for the BlueGene/L System
JSSPP '02 Revised Papers from the 8th International Workshop on Job Scheduling Strategies for Parallel Processing
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An Adaptive Submesh Allocation Strategy for Two-Dimensional Mesh Connected Systems
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 02
A fast and efficient strategy for submesh allocation in mesh-connected parallel computers
SPDP '93 Proceedings of the 1993 5th IEEE Symposium on Parallel and Distributed Processing
Efficient Subtorus Processor Allocation in a Multi-Dimensional Torus
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Multitoroidal Interconnects For Tightly Coupled Supercomputers
IEEE Transactions on Parallel and Distributed Systems
Towards characterizing cloud backend workloads: insights from Google compute clusters
ACM SIGMETRICS Performance Evaluation Review
Open job management architecture for the blue gene/l supercomputer
JSSPP'05 Proceedings of the 11th international conference on Job Scheduling Strategies for Parallel Processing
Hi-index | 0.00 |
Three-dimensional torus is a common topology of network interconnects of multicomputers due to its simplicity and high scalability. A parallel job submitted to a three-dimensional toroidal machine typically requires an isolated, contiguous, rectangular partition connected as a mesh or a torus. Such partitioning leads to fragmentation and thus reduces resource utilization of the machines. In particular, toroidal partitions often require allocation of additional communication links to close the torus. If the links are treated as dedicated resources (due to the partition isolation requirement) this may prevent allocation of other partitions that could, otherwise, use those links. Overall, on toroidal machines, the likelihood of successful allocation of a new partition decreases as the number of toroidal partitions increases. This paper presents a novel ”multi-toroidal” interconnect topology that is able to accommodate multiple adjacent meshed and toroidal partitions at the same time. We prove that this topology allows connecting every free partition of the machine as a torus without affecting existing partitions. We also show that for toroidal jobs this interconnect topology increases machine utilization by a factor of 2 to 4 (depending on the workload) compared with three-dimensional toroidal machines. This effect exists for different scheduling policies. The BlueGene/L supercomputer being developed by IBM Research is an example of a multi-toroidal interconnect architecture.