OMPI: optimizing MPI programs using partial evaluation

Authors:
Hirotaka Ogawa;Satoshi Matsuoka
Affiliations:
Department of Information Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan;Department of Information Engineering, The University of Tokyo, 7-3-1 Hongo, Bunkyo-ku, Tokyo 113, Japan
Venue:
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Year:
1996

Citing 5
Cited 11

Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Low-latency message communication support for the AP1000

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An efficient implementation scheme of concurrent object-oriented languages on stock multicomputers

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using MPI: portable parallel programming with the message-passing interface

Using MPI: portable parallel programming with the message-passing interface
Communication compilation for unreliable networks

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)

Learning from the Success of MPI

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters

Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Non-strict execution in parallel and distributed computing

International Journal of Parallel Programming
Automatic generation and tuning of MPI collective communication routines

Proceedings of the 19th annual international conference on Supercomputing
An MPI prototype for compiled communication on Ethernet switched clusters

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
STAR-MPI: self tuned adaptive routines for MPI collective operations

Proceedings of the 20th annual international conference on Supercomputing
Automatic and transparent optimizations of an application's MPI communication

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Automatic memory optimizations for improving MPI derived datatype performance

EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Exploiting single-assignment properties to optimize message-passing programs by code transformations

IFL'04 Proceedings of the 16th international conference on Implementation and Application of Functional Languages
Optimizing fine-grained communication in a biomolecular simulation application on Cray XK6

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

MPI is gaining acceptance as a standard for message-passing in high-performance computing, due to its powerful and flexible support of various communication styles. However, the complexity of its API poses significant software overhead, and as a result, applicability of MPI has been restricted to rather regular, coarse-grained computations. Our OMPI (Optimizing MPI) system removes much of the excess overhead by employing partial evaluation techniques, which exploit static information of MPI calls. Because partial evaluation alone is insufficient, we also utilize template functions for further optimization. To validate the effectiveness for our OMPI system, we performed baseline as well as more extensive benchmarks on a set of application cores with different communication characteristics, on the 64-node Fujitsu AP1000 MPP. Benchmarks show that OMPI improves execution efficiency by as much as factor of two for communication-intensive application core with minimal code increase. It also performs significantly better than previous dynamic optimization technique.