Global arrays: a nonuniform memory access programming model for high-performance computers
The Journal of Supercomputing
BSPlib: The BSP programming library
Parallel Computing
MPI-2 implementation on Fujitsu generic message passing kernel
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Single sided MPI implementations for SUN MPIr
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Exploiting Transparent Remote Memory Access for Non-Contiguous- and One-Sided-Communication
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
GASNet Specification, v1.1
High performance MPI-2 one-sided communication over InfiniBand
CCGRID '04 Proceedings of the 2004 IEEE International Symposium on Cluster Computing and the Grid
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Design issues in the implementation of MPI2 one sided communication in Ethernet based networks
PDCN'07 Proceedings of the 25th conference on Proceedings of the 25th IASTED International Multi-Conference: parallel and distributed computing and networks
Multicast communication in wormhole-routed 2D torus networks with hamiltonian cycle model
Journal of Systems Architecture: the EUROMICRO Journal
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
Formal verification of programs that use MPI one-sided communication
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
An evaluation of implementation options for MPI one-sided communication
PVM/MPI'05 Proceedings of the 12th European PVM/MPI users' group conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Analysis of implementation options for MPI-2 one-sided
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Revealing the performance of MPI RMA implementations
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Hi-index | 0.00 |
One-sided communication in Message Passing Interface (MPI) requires the use of one of three different synchronization mechanisms, which indicate when the one-sided operation can be started and when the operation is completed. Efficient implementation of the synchronization mechanisms is critical to achieving good performance with one-sided communication. However, our performance measurements indicate that in many MPI implementations, the synchronization functions add significant overhead, resulting in one-sided communication performing much worse than point-to-point communication for short- and medium-sized messages. In this paper, we describe our efforts to minimize the overhead of synchronization in our implementation of one-sided communication in MPICH2. We describe our optimizations for all three synchronization mechanisms defined in MPI: fence, post-start-complete-wait, and lock-unlock. Our performance results demonstrate that, for short messages, MPICH2 performs six times faster than LAM for fence synchronization and 50% faster for post-start-complete-wait synchronization, and it performs more than twice as fast as Sun MPI for all three synchronization methods.