Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Process Arrival Pattern and Shared Memory Aware Alltoall on InfiniBand
Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Efficient RDMA-based multi-port collectives on multi-rail QsNetII clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A preliminary analysis of the infinipath and XD1 network interfaces
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Efficient shared memory and RDMA based design for MPI_Allgather over infiniband
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
High performance RDMA based all-to-all broadcast for infiniband clusters
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Fast pattern-specific routing for fat tree networks
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
The All-to-All Personalized Exchange is the most dense collective communication function offered by the MPI specification. The operation involves every process sending a different message to all other participating processes. This collective operation is essential for many parallel scientific applications. With increasing system and message sizes, it becomes challenging to offer a fast, scalable and efficient implementation of this operation. InfiniBand is an emerging modern interconnect. It offers very low latency, high bandwidth and one-sided operations like RDMA write. Its advanced features like RDMA write gather allow us to design and implement All-to-all algorithms much more efficiently than in the past. Our aim in this paper is to design efficient and scalable implementations of traditional personalized exchange algorithms. In this paper we present two novel approaches towards designing All-to-all algorithms for short and long messages respectively. The Hypercube RDMA Write Gather and Direct Eager schemes effectively leverage the RDMA and RDMA with Write gather mechanisms offered by InfiniBand. Performance evaluation of our design and implementation reveals that it is able to reduce the All-to-All communication time by upto a factor of 3.07 for 32 byte messages on a 16 node InfiniBand cluster. Our analytical models suggest that the proposed designs will perform 64% better on InfiniBand clusters with 1024 nodes for 4k message size.