The Impact of MPI Queue Usage on Message Latency

Authors:
Keith D. Underwood;Ron Brightwell
Affiliations:
Sandia National Laboratories;Sandia National Laboratories
Venue:
ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Year:
2004

Citing 0
Cited 11

An analysis of the impact of MPI overlap and independent progress

Proceedings of the 18th annual international conference on Supercomputing
A Hardware Acceleration Unit for MPI Queue Processing

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Enhancing NIC Performance for MPI using Processing-in-Memory

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Analysis of Design Considerations for Optimizing Multi-Channel MPI over InfiniBand

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 9 - Volume 10
Analyzing the Impact of Overlap, Offload, and Independent Progress for Message Passing Interface Applications

International Journal of High Performance Computing Applications
SeaStar Interconnect: Balanced Bandwidth for Scalable Performance

IEEE Micro
Optimizing communication overlap for high-speed networks

Proceedings of the 12th ACM SIGPLAN symposium on Principles and practice of parallel programming
Overcoming the processor communication overhead in MPI applications

SpringSim '07 Proceedings of the 2007 spring simulation multiconference - Volume 2
Challenges and issues in benchmarking MPI

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Significantly reducing MPI intercommunication latency and power overhead in both embedded and HPC systems

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
A fast and resource-conscious MPI message queue mechanism for large-scale jobs

Future Generation Computer Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

It is well known that traditional micro-benchmarks do not fully capture the salient architectural features that impact application performance. Even worse, micro-benchmarks that target MPI and the communications sub-system do not accurately represent the way that applications use MPI. For example, traditional MPI latency benchmarks time a ping-pong communication with one send and one receive on each of two nodes. The time to post the receive is never counted as part of the latency. This scenario is not even marginally representative of most applications. Two new micro-benchmarks are presented here that analyze network latency in a way that more realistically represents the way that MPI is typically used. These benchmarks are used to evaluate modern high-performance networks, including Quadrics, InfiniBand, and Myrinet.