Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation

  • Authors:
  • Jiuxing Liu;Abhinav Vishnu;Dhabaleswar K. Panda

  • Affiliations:
  • Ohio State University;Ohio State University;Ohio State University

  • Venue:
  • Proceedings of the 2004 ACM/IEEE conference on Supercomputing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In the area of cluster computing, InfiniBand is becoming increasingly popular due to its open standard and high performance. However, even with InfiniBand, network bandwidth can still become the performance bottleneck for some of todayýs most demanding applications. In this paper, we study the problem of how to overcome the bandwidth bottleneck by using multirail networks. We present different ways of setting up multirail networks with InfiniBand and propose a unified MPI design that can support all these approaches. We have also discussed various important design issues and provided in-depth discussions of different policies of using multirail networks, including an adaptive striping scheme that can dynamically change the striping parameters based on current system condition. We have implemented our design and evaluated it using both microbenchmarks and applications. Our performance results show that multirail networks can significant improve MPI communication performance. With a two rail InfiniBand cluster, we have achieved almost twice the bandwidth and half the latency for large messages compared with the original MPI. At the application level, the multirail MPI can significantly reduce communication time as well as running time depending on the communication pattern. We have also shown that the adaptive striping scheme can achieve excellent performance without a priori knowledge of the bandwidth of each rail.