PM: An Operating System Coordinated High Performance Communication Library
HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
Global State Detection Using Network Preemption
IPPS '97 Proceedings of the Job Scheduling Strategies for Parallel Processing
Implementation of Gang-Scheduling on Workstation Cluster
IPPS '96 Proceedings of the Workshop on Job Scheduling Strategies for Parallel Processing
Realizing the performance potential of the virtual interface architecture
ICS '99 Proceedings of the 13th international conference on Supercomputing
The design and evaluation of high performance communication using a Gigabit Ethernet
ICS '99 Proceedings of the 13th international conference on Supercomputing
PM2: a high performance communication middleware for heterogeneous network environments
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Highly efficient gang scheduling implementation
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
ATOLL, a New Switched, High Speed Interconnect in Comparison to Myrinet and SCI
IPDPS '00 Proceedings of the 15 IPDPS 2000 Workshops on Parallel and Distributed Processing
System Area Network Extensions to the Parallel Virtual Machine
Proceedings of the 8th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Messaging on Gigabit Ethernet: Some Experiments with GAMMA and Other Systems
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Asynchronous MPI messaging on Myrinet
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
High performance RDMA-based MPI implementation over InfiniBand
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Cluster communication protocols for parallel-programming systems
ACM Transactions on Computer Systems (TOCS)
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Impact of Page Size on Communication Performance
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
The Journal of Supercomputing
High performance RDMA-based MPI implementation over infiniBand
International Journal of Parallel Programming - Special issue I: The 17th annual international conference on supercomputing (ICS'03)
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Martini: A Network Interface Controller Chip for High Performance Computing with Distributed PCs
IEEE Transactions on Parallel and Distributed Systems
An efficient design for fast memory registration in RDMA
Journal of Network and Computer Applications
Network interfaces for programmable NICs and multicore platforms
Computer Networks: The International Journal of Computer and Telecommunications Networking
A high performance superpipeline protocol for infiniband
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Design and implementation of zero-copy data path for efficient file transmission
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Design alternatives and performance trade-offs for implementing MPI-2 over infiniband
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Analysis of the memory registration process in the mellanox infiniband software stack
Euro-Par'06 Proceedings of the 12th international conference on Parallel Processing
Design of scalable Java message-passing communications over InfiniBand
The Journal of Supercomputing
RDMA in the SiCortex cluster systems
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.00 |
The overhead of copying data through the central processor by a message passing protocol limits data transfer bandwidth. If the network interface directly transfers the user's memory to the network by issuing DMA, such data copies may be eliminated. Since the DMA facility accesses the physical memory address space, user virtual memory must be pinned down to a physical memory location before the message is sent or received. If each message transfer involves pin-down and release kernel primitives, message transfer bandwidth will decrease since those primitives are quite expensive. We propose a zero copy message transfer with a pin-down cache technique which reuses the pinned-down area to decrease the number of calls to pin-down and release primitives. The proposed facility has been implemented in the PM low-level communication library on our RWC PC Cluster II, consisting of 64 Pentium Pro 200 MHz CPUs connected by a Myricom Myrinet network, and running NetBSD. The PM achieves 108.8 MBytes/sec for a 100 % pin-down cache hit ratio and 78.7 MBytes/sec for all pin-down cache miss. The MPI library has been implemented on top of PM. According to the NAS Parallel benchmarks result, an application is still better performance in case that cache miss ratio is very high.