LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient Barrier Using Remote Memory Operations on VIA-Based Clusters
CLUSTER '02 Proceedings of the IEEE International Conference on Cluster Computing
A New DMA Registration Strategy for Pinning-Based High Performance Networks
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
HOTI '05 Proceedings of the 13th Symposium on High Performance Interconnects
Memory registration caching correctness
CCGRID '05 Proceedings of the Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05) - Volume 2 - Volume 02
An efficient design for fast memory registration in RDMA
Journal of Network and Computer Applications
LogfP - a model for small messages in InfiniBand
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
High performance RDMA protocols in HPC
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Analysis of the memory registration process in the mellanox infiniband software stack
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Improving reactivity and communication overlap in MPI using a generic I/O manager
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
High performance checksum computation for fault-tolerant MPI over infiniband
EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface
Hi-index | 0.00 |
InfiniBand high performance networks require that the buffers used for sending or receiving data are registered. Since memory registration is an expensive operation, some communication libraries use caching (rcache) to amortize its cost, and copy data into preregistered buffers for small messages. In this paper, we present a software protocol for InfiniBand that always uses a memory copy, and amortizes the cost of this copy with a superpipeline to overlap the memory copy and the RDMA. We propose a performance model of our protocol to study its behavior and optimize its parameters. We have implemented our protocol in the NewMadeleine communication library. The results of MPI benchmarks show a significant improvement in cache-unfriendly applications that do not reuse the same memory blocks all over the time, without degradation for cache-friendly applications.