Asynchronous MPI messaging on Myrinet
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
High performance RDMA-based MPI implementation over InfiniBand
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
An MPI Library which uses Polling, Interrupts and Remote Copying for the Fujitsu AP1000+
ISPAN '96 Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and Networks
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
HPCS '08 Proceedings of the 2008 22nd International Symposium on High Performance Computing Systems and Applications
Tolerating message latency through the early release of blocked receives
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Proceedings of the 20th European MPI Users' Group Meeting
Hi-index | 0.00 |
Message Passing Interface (MPI) point-to-point communications are usually realized with two protocols, the eager protocol for small messages and the rendezvous protocol for medium and large sized messages. Traditional sender-initiated rendezvous protocols are sub-optimal in many situations. In this work, we propose to refine the rendezvous protocol for medium and large messages on RDMA-enabled clusters with three protocols that are customized for different situations, a hybrid protocol for medium sized messages when the sender arrives early, a sender-initiated protocol for large messages when the sender arrives early, and a receiver-initiated protocol when the receiver arrives early. In comparison to traditional sender-initiated rendezvous protocols, the proposed scheme reduces unnecessary synchronizations, decreases the number of control messages that are in the critical path of communications, and improves the communication progress, which results in a significantly better communication-computation overlap capability. We present and analyze these protocols, and describe how these protocols and the eager protocol can be seamlessly integrated in one system without introducing an excessive number of control messages. We have implemented the proposed scheme for InfiniBand clusters. The experimental results demonstrate the effectiveness of the proposed technique.