Supporting MPI-2 one sided communication on multi-rail infiniband clusters: design challenges and performance benefits

Authors:
Abhinav Vishnu;Gopal Santhanaraman;Wei Huang;Hyun-Wook Jin;Dhabaleswar K. Panda
Affiliations:
Department of Computer Science and Engineering, The Ohio State University, Columbus, OH;Department of Computer Science and Engineering, The Ohio State University, Columbus, OH;Department of Computer Science and Engineering, The Ohio State University, Columbus, OH;Department of Computer Science and Engineering, The Ohio State University, Columbus, OH;Department of Computer Science and Engineering, The Ohio State University, Columbus, OH
Venue:
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Year:
2005

Citing 9
Cited 2

A high-performance, portable implementation of the MPI message passing interface standard

Parallel Computing
Portable and Efficient Parallel Computing Using the BSP Model

IEEE Transactions on Computers
Interconnection Networks: An Engineering Approach

Interconnection Networks: An Engineering Approach
ARMCI: A Portable Remote Memory Copy Libray for Ditributed Array Libraries and Compiler Run-Time Systems

Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Using Multirail Networks in High-Performance Clusters

CLUSTER '01 Proceedings of the 3rd IEEE International Conference on Cluster Computing
GASNet Specification, v1.1

GASNet Specification, v1.1
Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Scheduling of MPI-2 One Sided Operations over InfiniBand

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
High performance MPI-2 one-sided communication over InfiniBand

CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid

A software based approach for providing network fault tolerance in clusters with uDAPL interface: MPI level design and performance evaluation

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
On using connection-oriented vs. connection-less transport for performance and scalability of collective and one-sided operations: trade-offs and impact

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

In cluster computing, InfiniBand has emerged as a popular high performance interconnect with MPI as the de facto programming model. However, even with InfiniBand, bandwidth can become a bottleneck for clusters executing communication intensive applications. Multi-rail cluster configurations with MPI-1 are being proposed to alleviate this problem. Recently, MPI-2 with support for one-sided communication is gaining significance. In this paper, we take the challenge of designing high performance MPI-2 one-sided communication on multi-rail InfiniBand clusters. We propose a unified MPI-2 design for different configurations of multi-rail networks (multiple ports, multiple HCAs and combinations). We present various issues associated with one-sided communication such as multiple synchronization messages, scheduling of RDMA (Read, Write) operations, ordering relaxation and discuss their implications on our design. Our performance results show that multi-rail networks can significantly improve MPI-2 one-sided communication performance. Using PCI-Express with two-ports, we can achieve a peak MPI_Put bidirectional bandwidth of 2620 Million Bytes/s, compared to 1910 MB/s for single-rail implementation. For PCI-X with two HCAs, we can almost double the throughput and reduce the latency to half for large messages.