Architectural requirements and scalability of the NAS parallel benchmarks
SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A TeraFLOP Supercomputer in 1996: The ASCI TFLOP System
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Design and Implementation of MPI on Puma Portals
MPIDC '96 Proceedings of the Second MPI Developers Conference
The Impact of MPI Queue Usage on Message Latency
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
PDCN'06 Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks
Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Distributed SBP Cholesky factorization algorithms with near-optimal scheduling
ACM Transactions on Mathematical Software (TOMS)
Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application
Proceedings of the 24th ACM International Conference on Supercomputing
A preliminary analysis of the infinipath and XD1 network interfaces
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Light-weight communications on Intel's single-chip cloud computer processor
ACM SIGOPS Operating Systems Review
Design and evaluation of nonblocking collective I/O operations
EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
A case for non-blocking collective operations
ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking
Hi-index | 0.00 |
The overlap of computation and communication has long been considered to be a significant performance benefit for applications. Similarly, the ability of MPI to make independent progress (that is, to make progress on outstanding communication operations while not in the MPI library) is also believed to yield performance benefits. Using an intelligent network interface to offload the work required to support overlap and independent progress is thought to be an ideal solution, but the benefits of this approach have been poorly studied at the application level. This lack of analysis is complicated by the fact that most MPI implementations do not sufficiently support overlap or independent progress. Recent work has demonstrated a quantifiable advantage for an MPI implementation that uses offload to provide overlap and independent progress. This paper extends this previous work by further qualifying the source of the performance advantage (offload, overlap, or independent progress).