Overcoming the processor communication overhead in MPI applications

Authors:
Gabriel Mateescu
Affiliations:
National Research Council, Ottawa, ON, Canada
Venue:
SpringSim '07 Proceedings of the 2007 spring simulation multiconference - Volume 2
Year:
2007

Citing 9
Cited 0

LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
A high-performance MPI implementation on a shared-memory vector supercomputer

Parallel Computing
Programming with POSIX threads

Programming with POSIX threads
Effects of communication latency, overhead, and bandwidth in a cluster architecture

Proceedings of the 24th annual international symposium on Computer architecture
The Impact of MPI Queue Usage on Message Latency

ICPP '04 Proceedings of the 2004 International Conference on Parallel Processing
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
PathScale InfiniPath: A First Look

HOTI '05 Proceedings of the 13th Symposium on High Performance Interconnects
Microbenchmark Performance Comparison of High-Speed Cluster Interconnects

IEEE Micro
Measuring MPI send and receive overhead and application availability in high performance network interfaces

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider parallel applications that use the MPI programming interface for inter-process communication and determine the processor communication overhead for high performance computing clusters that are built with high-speed interconnect networks such as Pathscale InfiniPath and that support either the open source Open MPI implementation or the Pathscale MPI implementation, or both. We show that, for large messages, the processor overhead is large for both MPI implementations and for both network interconnects. Then we develop a technique, based on using multi-threading in the MPI application, for overcoming the processor communication overhead. We demonstrate that our technique dramatically reduces the impact of the processor overhead.