X-SRQ - Improving Scalability and Performance of Multi-core InfiniBand Clusters

  • Authors:
  • Galen M. Shipman;Stephen Poole;Pavel Shamis;Ishai Rabinovitz

  • Affiliations:
  • Oak Ridge National Laboratory, Oak Ridge, TN, USA;Oak Ridge National Laboratory, Oak Ridge, TN, USA;Mellanox Technologies, Yokneam, Israel;Mellanox Technologies, Yokneam, Israel

  • Venue:
  • Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

To improve the scalability of InfiniBand on large scale clusters Open MPI introduced a protocol known as B-SRQ[2]. This protocol was shown to provide much better memory utilization of send and receive buffers for a wide variety of benchmarks and real-world applications.Unfortunately B-SRQincreases the number of connections between communicating peers. While addressing one scalability problem of InfiniBand the protocol introduced another. To alleviate the connection scalability problem of the B-SRQprotocol a small enhancement to the reliable connection transport was requested which would allow multiple shared receive queues to be attached to a single reliable connection. This modified reliable connection transport is now known as the extended reliable connection transport.X-SRQis a new transport protocol in Open MPI based on B-SRQwhich takes advantage of this improvement in connection scalability. This paper introduces the X-SRQprotocol and details the significantly improved scalability of the protocol over B-SRQand its reduction of the memory footprint of connection state by as much as 2 orders of magnitude on large scale multi-core systems. In addition to improving scalability, performance of latency-sensitive collective operations are improved by up to 38% while significantly decreasing the variability of results. A detailed analysis of the improved memory scalability as well as the improved performance are discussed.