On using connection-oriented vs. connection-less transport for performance and scalability of collective and one-sided operations: trade-offs and impact

  • Authors:
  • Amith R. Mamidala;Sundeep Narravula;Abhinav Vishnu;Gopal Santhanaraman;Dhabaleswar K. Panda

  • Affiliations:
  • The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH

  • Venue:
  • Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Communication subsystem plays a pivotal role in achieving scalable performance in clusters. The communication semantics employed are dictated by the programming model used by the application such as MPI, UPC, etc. Out of the gamut of communication primitives, collective and one-sided operations are especially significant and have to be designed harnessing the capabilities and features exposed by the underlying networks. In some cases, there is a direct match between the semantics of the operations and the underlying network primitives. InfiniBand provides two transport modes: (i)Connection-oriented Reliable connection (RC) supporting Memory and Channel semantics and (ii) Connection-less Unreliable Datagram (UD) supporting Channel semantics. Achieving good performance and scalability needs careful analysis and design of communication primitives based on these options. In this paper, we evaluate the scalability and performance trade-offs between RC and UD transport modes. We study the semantic advantages of mapping collective and one-sided operations on to memory and channel semantics of InfiniBand(IBA). We take AlltoAll as a case study to demonstrate the benefits of RDMA over Send/Recv and to show the performance/memory trade-offs over IB transports. Our experimental results show that UD-based AlltoAll performs 38% better than Bruck's algorithm for short messages and up to two times better than the direct AlltoAll over RC. Since InfiniBand does not provide RDMA over UD in hardware, we emulate the same in our study. Our results show a performance dip of up to a factor of three for emulated RDMA Read latency as compared to RC, highlighting the need for hardware based RDMA operations over UD.