Efficient and Scalable All-to-All Personalized Exchange for InfiniBand-Based Clusters

  • Authors:
  • Sayantan Sur;Hyun-Wook Jin;Dhabaleswar K. Panda

  • Affiliations:
  • Ohio State University;Ohio State University;Ohio State University

  • Venue:
  • ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The All-to-All Personalized Exchange is the most dense collective communication function offered by the MPI specification. The operation involves every process sending a different message to all other participating processes. This collective operation is essential for many parallel scientific applications. With increasing system and message sizes, it becomes challenging to offer a fast, scalable and efficient implementation of this operation. InfiniBand is an emerging modern interconnect. It offers very low latency, high bandwidth and one-sided operations like RDMA write. Its advanced features like RDMA write gather allow us to design and implement All-to-all algorithms much more efficiently than in the past. Our aim in this paper is to design efficient and scalable implementations of traditional personalized exchange algorithms. In this paper we present two novel approaches towards designing All-to-all algorithms for short and long messages respectively. The Hypercube RDMA Write Gather and Direct Eager schemes effectively leverage the RDMA and RDMA with Write gather mechanisms offered by InfiniBand. Performance evaluation of our design and implementation reveals that it is able to reduce the All-to-All communication time by upto a factor of 3.07 for 32 byte messages on a 16 node InfiniBand cluster. Our analytical models suggest that the proposed designs will perform 64% better on InfiniBand clusters with 1024 nodes for 4k message size.