Performance Evaluation of the Quadrics Interconnection Network
Cluster Computing
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Message passing and shared address space parallelism on an SMP cluster
Parallel Computing
An MPI Library which uses Polling, Interrupts and Remote Copying for the Fujitsu AP1000+
ISPAN '96 Proceedings of the 1996 International Symposium on Parallel Architectures, Algorithms and Networks
An Evaluation of the Myrinet/GM2 Two-Port Networks
LCN '04 Proceedings of the 29th Annual IEEE International Conference on Local Computer Networks
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
International Journal of High Performance Computing Applications
A comparison of 4X InfiniBand and Quadrics Elan-4 technologies
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
NIC-based offload of dynamic user-defined modules for Myrinet clusters
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
CLUSTER '04 Proceedings of the 2004 IEEE International Conference on Cluster Computing
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
HOTI '07 Proceedings of the 15th Annual IEEE Symposium on High-Performance Interconnects
Cell broadband engine architecture and its first implementation: a performance view
IBM Journal of Research and Development
HPCS '08 Proceedings of the 2008 22nd International Symposium on High Performance Computing Systems and Applications
An automated approach to improve communication-computation overlap in clusters
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Improving reactivity and communication overlap in MPI using a generic I/O manager
PVM/MPI'07 Proceedings of the 14th European conference on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Using triggered operations to offload rendezvous messages
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
A fast and resource-conscious MPI message queue mechanism for large-scale jobs
Future Generation Computer Systems
Hi-index | 0.00 |
Overlapping computation with communication is a key technique to conceal the effect of communication latency on the performance of parallel applications. Message Passing Interface (MPI) is a widely used message passing standard for high performance computing. One of the most important factors in achieving a good level of overlap is the MPI ability to make progress on outstanding communication operations. In this paper, we propose a novel speculative MPI Rendezvous protocol that uses RDMA Read and RDMA Write to effectively improve communication progress and consequently the overlap ability. Performance results based on a modified MPICH2 implementation over 10-Gigabit iWARP Ethernet reveal a significant (80-100%) improvement in receiver side overlap and progress ability. We have also observed up to 30% improvement in application wait time for some NPB applications as well as the RADIX application. For applications that do not benefit from this protocol, an adaptation mechanism is used to stop the speculation to effectively reduce the protocol overhead.