Designing Zero-Copy Message Passing Interface Derived Datatype Communication Over Infiniband: Alternative Approaches and Performance Evaluation

  • Authors:
  • Gopalakrishnan Santhanaraman;Jiesheng Wu;Wei Huang;Dhabaleswar K. Panda

  • Affiliations:
  • Network-Based Computing Laboratory, Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA;Network-Based Computing Laboratory, Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA;Network-Based Computing Laboratory, Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA;Network-Based Computing Laboratory, Department of Computer Science and Engineering, The Ohio State University, Columbus, OH 43210, USA

  • Venue:
  • International Journal of High Performance Computing Applications
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we present a new scheme, Send Gather Receive Scatter (SGRS), to perform zero-copy datatype communication over InfiniBand. This scheme leverages the gather/scatter feature provided by InfiniBand channel semantics. It takes advantage of the capability of processing non-contiguity on both send and receive sides in the Send Gather and Receive Scatter operations. We have implemented this new design and evaluated the performance for Message Passing Interface level point-to-point microbenchmarks and collectives, on PCI-X and upcoming high performance PCI-Express systems. In our previous work we had come up with an alternate zero-copy approach using multiple RDMA Writes (Multi-W). Compared to the existing Multi-W zero-copy datatype scheme, the SGRS scheme can overcome the drawbacks of low network utilization and high startup cost. On PCI-X platforms, our experimental results show significant improvement in both point-to-point and collective datatype communication. The latency of a vector datatype can be reduced by up to 62% and the bandwidth shows improvement up to 400% as compared with the Multi-W scheme. The Alltoall collective shows up to 23% reduction in latency. Further, the SGRS scheme shows low CPU overhead with a potential promise for better computation and communication overlap. The experimental results on PCI-Express platforms demonstrate the relevance of zero-copy protocols to overcome memory bandwidth limitations. The trends we observe in PCI-X platform are magnified on PCI-Express platforms with even higher improvement for the microbenchmarks and collectives.