SKaMPI: A Detailed, Accurate MPI Benchmark
Proceedings of the 5th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters
Proceedings of the 21st annual international conference on Supercomputing
Infiniband scalability in open MPI
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Investigations on InfiniBand: efficient network buffer utilization at scale
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.00 |
To improve the scalability of InfiniBand on large scale clusters Open MPI introduced a protocol known as B-SRQ[2]. This protocol was shown to provide much better memory utilization of send and receive buffers for a wide variety of benchmarks and real-world applications.Unfortunately B-SRQincreases the number of connections between communicating peers. While addressing one scalability problem of InfiniBand the protocol introduced another. To alleviate the connection scalability problem of the B-SRQprotocol a small enhancement to the reliable connection transport was requested which would allow multiple shared receive queues to be attached to a single reliable connection. This modified reliable connection transport is now known as the extended reliable connection transport.X-SRQis a new transport protocol in Open MPI based on B-SRQwhich takes advantage of this improvement in connection scalability. This paper introduces the X-SRQprotocol and details the significantly improved scalability of the protocol over B-SRQand its reduction of the memory footprint of connection state by as much as 2 orders of magnitude on large scale multi-core systems. In addition to improving scalability, performance of latency-sensitive collective operations are improved by up to 38% while significantly decreasing the variability of results. A detailed analysis of the improved memory scalability as well as the improved performance are discussed.