The IBM Blue Gene/Q interconnection network and message unit
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
IEEE Micro
Optimization principles for collective neighborhood communications
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Revealing the performance of MPI RMA implementations
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Revisiting persistent communication in MPI
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Hi-index | 0.00 |
This paper proposes a novel optimization technique for MPI persistent communication, that utilizes multiple RDMA engines to carry out low latency communication. Because the interconnects used in modern supercomputers have multiple RDMA engines, the multiple communication requests specified by a persistent communication invocation can be scheduled onto the RDMA engines in an optimal way and thus result in better communication performance. Such a scheduling algorithm is not only a packing problem, but also avoids interconnect resource contentions as much as possible. The scheduling algorithm proposed in this paper balances load of RDMA engines and mitigates network link contentions in case of neighbor communication patterns such like in stencil computation. The proposed scheduling algorithm is implemented in Open MPI of K computer using RDMA functions provided as Fujitsu MPI extensions. A typical 2D stencil computation of a climate simulation code is used as a benchmark program. The experimental result shows that a factor of two speedup of communication time is achieved.