An analysis of the impact of MPI overlap and independent progress

Authors:
Ron Brightwell;Keith D. Underwood
Affiliations:
Sandia National Laboratories, Albuquerque, NM;Sandia National Laboratories, Albuquerque, NM
Venue:
Proceedings of the 18th annual international conference on Supercomputing
Year:
2004

Citing 9
Cited 11

Architectural requirements and scalability of the NAS parallel benchmarks

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Myrinet: A Gigabit-per-Second Local Area Network

IEEE Micro
The Quadrics Network: High-Performance Clustering Technology

IEEE Micro
Communication Characteristics of Large-Scale Scientific Applications for Contemporary Cluster Architectures

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A TeraFLOP Supercomputer in 1996: The ASCI TFLOP System

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Design and Implementation of MPI on Puma Portals

MPIDC '96 Proceedings of the Second MPI Developers Conference
The Impact of MPI Queue Usage on Message Latency

ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
The Case of the Missing Supercomputer Performance: Achieving Optimal Performance on the 8,192 Processors of ASCI Q

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics

Proceedings of the 2003 ACM/IEEE conference on Supercomputing

Transformations to Parallel Codes for Communication-Computation Overlap

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Studying the performance of overlapping communication and computation by active message: INUKTITUT case

PDCN'06 Proceedings of the 24th IASTED international conference on Parallel and distributed computing and networks
A framework for characterizing overlap of communication and computation in parallel applications

Cluster Computing
Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Distributed SBP Cholesky factorization algorithms with near-optimal scheduling

ACM Transactions on Mathematical Software (TOMS)
Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application

Proceedings of the 24th ACM International Conference on Supercomputing
A preliminary analysis of the infinipath and XD1 network interfaces

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Light-weight communications on Intel's single-chip cloud computer processor

ACM SIGOPS Operating Systems Review
Design and evaluation of nonblocking collective I/O operations

EuroMPI'11 Proceedings of the 18th European MPI Users' Group conference on Recent advances in the message passing interface
A case for non-blocking collective operations

ISPA'06 Proceedings of the 2006 international conference on Frontiers of High Performance Computing and Networking

Quantified Score

Hi-index	0.00

Visualization

Abstract

The overlap of computation and communication has long been considered to be a significant performance benefit for applications. Similarly, the ability of MPI to make independent progress (that is, to make progress on outstanding communication operations while not in the MPI library) is also believed to yield performance benefits. Using an intelligent network interface to offload the work required to support overlap and independent progress is thought to be an ideal solution, but the benefits of this approach have been poorly studied at the application level. This lack of analysis is complicated by the fact that most MPI implementations do not sufficiently support overlap or independent progress. Recent work has demonstrated a quantifiable advantage for an MPI implementation that uses offload to provide overlap and independent progress. This paper extends this previous work by further qualifying the source of the performance advantage (offload, overlap, or independent progress).