Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms

Authors:
A. R. Mamidala;Jiuxing Liu;D. K. Panda
Affiliations:
Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA;Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA;Dept. of Comput. Sci. & Eng., Ohio State Univ., Columbus, OH, USA
Venue:
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
Year:
2004

Citing 0
Cited 12

Optimizing All-to-All Collective Communication by Exploiting Concurrency in Modern Networks

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
On using connection-oriented vs. connection-less transport for performance and scalability of collective and one-sided operations: trade-offs and impact

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters

Proceedings of the 21st annual international conference on Supercomputing
Performance Analysis of Leading HPC Architectures With Beambeam3D

International Journal of High Performance Computing Applications
Bandwidth optimal all-reduce algorithms for clusters of workstations

Journal of Parallel and Distributed Computing
A study of process arrival patterns for MPI collective operations

International Journal of Parallel Programming
Efficient SMP-aware MPI-level broadcast over InfiniBand's hardware multicast

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A preliminary analysis of the infinipath and XD1 network interfaces

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Efficient shared memory and RDMA based design for MPI_Allgather over infiniband

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
High performance RDMA based all-to-all broadcast for infiniband clusters

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Efficient hardware multicast group management for multiple MPI communicators over infiniband

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Communication-Efficient algorithms for numerical quantum dynamics

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2

Quantified Score

Hi-index	0.00

Visualization

Abstract

Popular algorithms proposed in the literature for doing Barrier and Allreduce in clusters, such as pair-wise exchange, dissemination and gather-broadcast do not give an optimal performance when there is skew among the nodes in the cluster. In pair-wise exchange and dissemination, all the nodes must arrive for the completion of each step. The gather-broadcast algorithm assumes a fixed tree topology. We propose to use hardware multicast of InfiniBand in the design of an adaptive algorithm that performs well in the presence of skew. In this approach, the topology of the tree is not fixed but adapts depending on the skew. The last arriving node becomes the root of the tree if the skew is sufficiently large. We have carried out in-depth evaluation of our scheme and use synchronization delay as the performance metric for Barrier and Allreduce in the presence of skew. Our performance evaluation shows that our design scales very well with system size. Our designs can reduce the synchronization delay by a factor of 2.28 for Barrier and by a factor of 2.18 in the case of Allreduce. We have examined different skew scenarios and showed that the adaptive design performs either better or comparably to the existing schemes.