Analysis of the memory registration process in the mellanox infiniband software stack

Authors:
Frank Mietke;Robert Rex;Robert Baumgartl;Torsten Mehlan;Torsten Hoefler;Wolfgang Rehm
Affiliations:
Department of Computer Science, Chemnitz University of Technology, Germany;Department of Computer Science, Chemnitz University of Technology, Germany;Department of Computer Science, Chemnitz University of Technology, Germany;Department of Computer Science, Chemnitz University of Technology, Germany;Department of Computer Science, Chemnitz University of Technology, Germany;Department of Computer Science, Chemnitz University of Technology, Germany
Venue:
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Year:
2006

Citing 7
Cited 4

High performance RDMA-based MPI implementation over InfiniBand

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
A New DMA Registration Strategy for Pinning-Based High Performance Networks

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Providing a High-Performance VIA-Module for LAM/MPI

PARELEC '04 Proceedings of the international conference on Parallel Computing in Electrical Engineering
In-Kernel Integration of Operating System and Infiniband Functions for High Performance Computing Clusters: A DSM Example

IEEE Transactions on Parallel and Distributed Systems
Unifier: unifying cache management and communication buffer management for PVFS over InfiniBand

CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
High performance RDMA based all-to-all broadcast for infiniband clusters

HiPC'05 Proceedings of the 12th international conference on High Performance Computing

Implementation and performance analysis of non-blocking collective operations for MPI

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
LogGOPSim: simulating large-scale applications in the LogGOPS model

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Scalable memory registration for high performance networks using helper threads

Proceedings of the 8th ACM International Conference on Computing Frontiers
A high performance superpipeline protocol for infiniband

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

To leverage high speed interconnects like InfiniBand it is important to minimize the communication overhead. The most interfering overhead is the registration of communication memory. In this paper, we present our analysis of the memory registration process inside the Mellanox InfiniBand driver and possible ways out of this bottleneck. We evaluate and characterize the most time consuming parts in the execution path of the memory registration function using the Read Time Stamp Counter (RDTSC) instruction. We present measurements on AMD Opteron and Intel Xeon systems with different types of Host Channel Adapters for PCI-X and PCI-Express. Finally, we conclude with first results using Linux hugepage support to shorten the time of registering a memory region.