Fat-tree routing and node ordering providing contention free traffic for MPI global collectives

Authors:
Eitan Zahavi
Affiliations:
-
Venue:
Journal of Parallel and Distributed Computing
Year:
2012

Citing 10
Cited 2

Fat-trees: universal networks for hardware-efficient supercomputing

IEEE Transactions on Computers
A message passing standard for MPP and workstations

Communications of the ACM
Predictive performance and scalability modeling of a large-scale application

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
k -ary n -trees: High Performance Networks for Massively Parallel Architectures

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
On generalized fat trees

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Congestion-Free Routing on the CM-5 Data Router

PCRCW '94 Proceedings of the First International Workshop on Parallel Computer Routing and Communication
Scaling All-to-All Multicast on Fat-tree Networks

ICPADS '04 Proceedings of the Parallel and Distributed Systems, Tenth International Conference
Performance Analysis of MPI Collective Operations

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 15 - Volume 16
A framework for end-to-end simulation of high-performance computing systems

Proceedings of the 1st international conference on Simulation tools and techniques for communications, networks and systems & workshops
Optimized InfiniBandTM fat-tree routing for shift all-to-all communication patterns

Concurrency and Computation: Practice & Experience - International Supercomputing Conference (ISC07)

Distributed adaptive routing for big-data applications running on data center networks

Proceedings of the eighth ACM/IEEE symposium on Architectures for networking and communications systems
A new proposal to deal with congestion in InfiniBand-based fat-trees

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

As the size of High Performance Computing clusters grows, so does the probability of interconnect hot spots that degrade the latency and effective bandwidth the network provides. This paper presents a solution to this scalability problem for real life constant bisectional-bandwidth fat-tree topologies. It is shown that maximal bandwidth and cut-through latency can be achieved for MPI global collective traffic. To form such a congestion-free configuration, MPI programs should utilize collective communication, MPI-node-order should be topology aware, and the packet routing should match the MPI communication patterns. First, we show that MPI collectives can be classified into unidirectional and bidirectional shifts. Using this property, we propose a scheme for congestion-free routing of the global collectives in fully and partially populated fat trees running a single job. The no-contention result is then obtained for multiple jobs running on the same fat-tree by applying some job size and placement restrictions. Simulation results of the proposed routing, MPI-node-order and communication patterns show no contention which provides a 40% throughput improvement over previously published results for all-to-all collectives.