Automatic and transparent optimizations of an application's MPI communication

Authors:
Thorvald Natvig;Anne C. Elster
Affiliations:
Norwegian University of Science and Technology, Dept. of Computer and Information Science, Trondheim, Norway;Norwegian University of Science and Technology, Dept. of Computer and Information Science, Trondheim, Norway
Venue:
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Year:
2006

Citing 6
Cited 2

OMPI: optimizing MPI programs using partial evaluation

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
MPI-The Complete Reference, Volume 1: The MPI Core

MPI-The Complete Reference, Volume 1: The MPI Core
MPI: A Message-Passing Interface Standard

MPI: A Message-Passing Interface Standard
Automatic generation and tuning of MPI collective communication routines

Proceedings of the 19th annual international conference on Supercomputing
An MPI prototype for compiled communication on Ethernet switched clusters

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
An expert assistant for computer aided parallelization

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing

Automatic Run-time Parallelization and Transformation of I/O

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Run-time analysis and instrumentation for communication overlap potential

EuroMPI'10 Proceedings of the 17th European MPI users' group meeting conference on Recent advances in the message passing interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

HPC users frequently develop and run their MPI programs without optimizing communication, leading to poor performance on clusters connected with regular Gigabit Ethernet. Unfortunately, optimizing communication patterns will often decrease the clarity and ease of modification of the code, and users desire to focus on the application problem and not the tool used to solve it. In this paper, our new method for automatically optimizing any application's communication is presented. All MPI calls are intercepted by a library we inject into the application. Memory associated with MPI requests is protected using hardware supported memory protection. The request can then continue in the background as an asynchronous operation while the application is allowed to continue as if the request is finished. Once the data is accessed by the application, a page fault will occur, and our injected library will wait for the background transfer to finish before allowing the application to continue. Performance close to that of manual optimization are observed on our test-cases when run on Gigabit Ethernet clusters.