High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters

  • Authors:
  • Matthew J. Koop;Sayantan Sur;Qi Gao;Dhabaleswar K. Panda

  • Affiliations:
  • The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH;The Ohio State University, Columbus, OH

  • Venue:
  • Proceedings of the 21st annual international conference on Supercomputing
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

High-performance clusters have been growing rapidly in scale. Most of these clusters deploy a high-speed interconnect, such as Infini-Band, to achieve higher performance. Most scientific applications executing on these clusters use the Message Passing Interface (MPI) as the parallel programming model. Thus, the MPI library has a key role in achieving application performance by consuming as few resources as possible and enabling scalable performance. State-of-the-art MPI implementations over InfiniBand primarily use the Reliable Connection (RC) transport due to its good performance and attractive features. However, the RC transport requires a connection between every pair of communicating processes, with each requiring several KB of memory. As clusters continue to scale, memory requirements in RC-based implementations increase. The connection-less Unreliable Datagram (UD) transport is an attractive alternative, which eliminates the need to dedicate memory for each pair of processes. In this paper we present a high-performance UD-based MPI design. We implement our design and compare the performance and resource usage with the RC-based MVAPICH. We evaluate NPB, SMG2000, Sweep3D, and sPPM up to 4K processes on an 9216-core InfiniBand cluster. For SMG2000, our prototype shows a 60% speedup and seven-fold reduction in memory for 4K processes. Additionally, based on our model, our design has an estimated 30 times reduction in memory over MVAPICH at 16K processes when all connections are created. To the best of our knowledge, this is the first research work that presents a high-performance MPI design over InfiniBand that is completely based on UD and can achieve near identical or better application performance than RC.