Portable and Efficient Parallel Computing Using the BSP Model
IEEE Transactions on Computers
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
The implementation of MPI-2 one-sided communication for the NEC SX-5
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Single sided MPI implementations for SUN MPIr
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Proceedings of the 11 IPPS/SPDP'99 Workshops Held in Conjunction with the 13th International Parallel Processing Symposium and 10th Symposium on Parallel and Distributed Processing
Implementing MPI's One-Sided Communications for WMPI
Proceedings of the 6th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
GASNet Specification, v1.1
High performance MPI-2 one-sided communication over InfiniBand
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application
Proceedings of the 24th ACM International Conference on Supercomputing
HiPC'05 Proceedings of the 12th international conference on High Performance Computing
Design alternatives and performance trade-offs for implementing MPI-2 over infiniband
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.00 |
MPI-2 provides interfaces for one sided communication, which is becoming increasingly important in scientific applications. MPI-2 semantics provide the flexibility to reorder the one sided operations within an access epoch. Based on this flexibility, in this paper we try to improve the performance of one sided communication by scheduling one sided operations. We have come up with several re-ordering and aggregating schemes to achieve better network utilization. We have evaluated these schemes on both PCI-X and PCI-Express platforms. With re-ordering scheme, we see an improvement in the throughput up to 76%, latency up to 40%. With aggregation scheme, we observe an improvement of 44% and 42% for MPI_Put and MPI_Get latency respectively on PCI-Express platform.