Pin-down Cache: A Virtual Memory Management Technique for Zero-copy Communication

Authors:
Affiliations:
Venue:
IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
Year:
1998

Citing 3
Cited 26

PM: An Operating System Coordinated High Performance Communication Library

HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Global State Detection Using Network Preemption

IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Implementation of Gang-Scheduling on Workstation Cluster

IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing

Realizing the performance potential of the virtual interface architecture

ICS '99 Proceedings of the 13th international conference on Supercomputing
The design and evaluation of high performance communication using a Gigabit Ethernet

ICS '99 Proceedings of the 13th international conference on Supercomputing
PM2: a high performance communication middleware for heterogeneous network environments

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Highly efficient gang scheduling implementation

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
ATOLL, a New Switched, High Speed Interconnect in Comparison to Myrinet and SCI

IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
System Area Network Extensions to the Parallel Virtual Machine

Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Messaging on Gigabit Ethernet: Some Experiments with GAMMA and Other Systems

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Asynchronous MPI messaging on Myrinet

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A survey of messaging software issues and systems for Myrinet-based clusters

Cluster computing
High performance RDMA-based MPI implementation over InfiniBand

ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Cluster communication protocols for parallel-programming systems

ACM Transactions on Computer Systems (TOCS)
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Impact of Page Size on Communication Performance

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

The Journal of Supercomputing
High performance RDMA-based MPI implementation over infiniBand

International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
Software distributed shared memory over virtual interface architecture: implementation and performance

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Martini: A Network Interface Controller Chip for High Performance Computing with Distributed PCs

IEEE Transactions on Parallel and Distributed Systems
An efficient design for fast memory registration in RDMA

Journal of Network and Computer Applications
Network interfaces for programmable NICs and multicore platforms

Computer Networks: The International Journal of Computer and Telecommunications Networking
A high performance superpipeline protocol for infiniband

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Design and implementation of zero-copy data path for efficient file transmission

HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Design alternatives and performance trade-offs for implementing MPI-2 over infiniband

PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Analysis of the memory registration process in the mellanox infiniband software stack

Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Design of scalable Java message-passing communications over InfiniBand

The Journal of Supercomputing
RDMA in the SiCortex cluster systems

PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Improving MPI communication overlap with collaborative polling

Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The overhead of copying data through the central processor by a message passing protocol limits data transfer bandwidth. If the network interface directly transfers the user's memory to the network by issuing DMA, such data copies may be eliminated. Since the DMA facility accesses the physical memory address space, user virtual memory must be pinned down to a physical memory location before the message is sent or received. If each message transfer involves pin-down and release kernel primitives, message transfer bandwidth will decrease since those primitives are quite expensive. We propose a zero copy message transfer with a pin-down cache technique which reuses the pinned-down area to decrease the number of calls to pin-down and release primitives. The proposed facility has been implemented in the PM low-level communication library on our RWC PC Cluster II, consisting of 64 Pentium Pro 200 MHz CPUs connected by a Myricom Myrinet network, and running NetBSD. The PM achieves 108.8 MBytes/sec for a 100 % pin-down cache hit ratio and 78.7 MBytes/sec for all pin-down cache miss. The MPI library has been implemented on top of PM. According to the NAS Parallel benchmarks result, an application is still better performance in case that cache miss ratio is very high.