Active messages: a mechanism for integrated communication and computation
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Low-latency message communication support for the AP1000
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
An efficient implementation scheme of concurrent object-oriented languages on stock multicomputers
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using MPI: portable parallel programming with the message-passing interface
Using MPI: portable parallel programming with the message-passing interface
Communication compilation for unreliable networks
ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Learning from the Success of MPI
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
Non-strict execution in parallel and distributed computing
International Journal of Parallel Programming
Automatic generation and tuning of MPI collective communication routines
Proceedings of the 19th annual international conference on Supercomputing
An MPI prototype for compiled communication on Ethernet switched clusters
Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
STAR-MPI: self tuned adaptive routines for MPI collective operations
Proceedings of the 20th annual international conference on Supercomputing
Automatic and transparent optimizations of an application's MPI communication
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Automatic memory optimizations for improving MPI derived datatype performance
EuroPVM/MPI'06 Proceedings of the 13th European PVM/MPI User's Group conference on Recent advances in parallel virtual machine and message passing interface
Exploiting single-assignment properties to optimize message-passing programs by code transformations
IFL'04 Proceedings of the 16th international conference on Implementation and Application of Functional Languages
Optimizing fine-grained communication in a biomolecular simulation application on Cray XK6
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive communication mechanism for accelerating MPI functions in NoC-based multicore processors
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
MPI is gaining acceptance as a standard for message-passing in high-performance computing, due to its powerful and flexible support of various communication styles. However, the complexity of its API poses significant software overhead, and as a result, applicability of MPI has been restricted to rather regular, coarse-grained computations. Our OMPI (Optimizing MPI) system removes much of the excess overhead by employing partial evaluation techniques, which exploit static information of MPI calls. Because partial evaluation alone is insufficient, we also utilize template functions for further optimization. To validate the effectiveness for our OMPI system, we performed baseline as well as more extensive benchmarks on a set of application cores with different communication characteristics, on the 64-node Fujitsu AP1000 MPP. Benchmarks show that OMPI improves execution efficiency by as much as factor of two for communication-intensive application core with minimal code increase. It also performs significantly better than previous dynamic optimization technique.