Efficient and Scalable All-to-All Personalized Exchange for InfiniBand-Based Clusters

Authors:
Sayantan Sur;Hyun-Wook Jin;Dhabaleswar K. Panda
Affiliations:
Ohio State University;Ohio State University;Ohio State University
Venue:
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Year:
2004

Citing 0
Cited 9

On using connection-oriented vs. connection-less transport for performance and scalability of collective and one-sided operations: trade-offs and impact

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient shared memory and RDMA based collectives on multi-rail QsNetII SMP clusters

Cluster Computing
Process Arrival Pattern and Shared Memory Aware Alltoall on InfiniBand

Proceedings of the 16th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Efficient RDMA-based multi-port collectives on multi-rail QsNetII clusters

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
A preliminary analysis of the infinipath and XD1 network interfaces

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Efficient shared memory and RDMA based design for MPI_Allgather over infiniband

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
High performance RDMA based all-to-all broadcast for infiniband clusters

HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Assessing the performance and scalability of a novel multilevel k-nomial allgather on CORE-Direct systems

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Fast pattern-specific routing for fat tree networks

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

The All-to-All Personalized Exchange is the most dense collective communication function offered by the MPI specification. The operation involves every process sending a different message to all other participating processes. This collective operation is essential for many parallel scientific applications. With increasing system and message sizes, it becomes challenging to offer a fast, scalable and efficient implementation of this operation. InfiniBand is an emerging modern interconnect. It offers very low latency, high bandwidth and one-sided operations like RDMA write. Its advanced features like RDMA write gather allow us to design and implement All-to-all algorithms much more efficiently than in the past. Our aim in this paper is to design efficient and scalable implementations of traditional personalized exchange algorithms. In this paper we present two novel approaches towards designing All-to-all algorithms for short and long messages respectively. The Hypercube RDMA Write Gather and Direct Eager schemes effectively leverage the RDMA and RDMA with Write gather mechanisms offered by InfiniBand. Performance evaluation of our design and implementation reveals that it is able to reduce the All-to-All communication time by upto a factor of 3.07 for 32 byte messages on a 16 node InfiniBand cluster. Our analytical models suggest that the proposed designs will perform 64% better on InfiniBand clusters with 1024 nodes for 4k message size.