An efficient design for fast memory registration in RDMA

Authors:
Li Ou;Xubin He;Jizhong Han
Affiliations:
DELL Inc., USA;Electrical and Computer Engineering Department, Tennessee Technological University, Cookeville, TN 38505, USA;Institute of Computing Technology, Chinese Academy of Sciences, China
Venue:
Journal of Network and Computer Applications
Year:
2009

Citing 12
Cited 2

Data cache management using frequency-based replacement

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
An approximate analysis of the LRU and FIFO buffer replacement schemes

SIGMETRICS '90 Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The LRU-K page replacement algorithm for database disk buffering

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
A scalable HTTP server: the NCSA prototype

Selected papers of the first conference on World-Wide Web
The working set model for program behavior

Communications of the ACM
LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
2Q: A Low Overhead High Performance Buffer Management Replacement Algorithm

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
A New DMA Registration Strategy for Pinning-Based High Performance Networks

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
ARC: A Self-Tuning, Low Overhead Replacement Cache

FAST '03 Proceedings of the 2nd USENIX Conference on File and Storage Technologies
A Fast Read/Write Process to Reduce RDMA Communication Latency

IWNAS '06 Proceedings of the 2006 International Workshop on Networking, Architecture, and Storages

Scalable memory registration for high performance networks using helper threads

Proceedings of the 8th ACM International Conference on Computing Frontiers
A high performance superpipeline protocol for infiniband

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

Remote Direct Memory Access (RDMA) improves network bandwidth and reduces latency by eliminating unnecessary copies from network interface card to application buffers, but the communication buffer management to reduce memory registration and deregistration cost is a significant challenge to be addressed. Previous studies use pin-down cache and batched deregistration, but only simple LRU is used as a replacement algorithm to manage cache space. In this paper, we evaluate the cost of memory registration in both user and kernel spaces. Based on our analysis, we reduce the overhead of communication buffer management in two aspects simultaneously: utilize a Memory Registration Region Cache (MRRC), and optimize the RDMA communication process of clients and servers with Fast RDMA Read and Write Process (FRRWP). MRRC manages memory in terms of memory region, and replaces old memory regions according to both their sizes and recency. FRRWP overlaps memory registrations between a client and a server, and allows applications to submit RDMA write operations without being blocked by message synchronization. We compare the performance of MRRC and FRRWP with traditional RDMA operations. The results show that our new design improves the total cost of memory registrations and overall communication latency by up to 70%.