The Omega test: a fast and practical integer programming algorithm for dependence analysis
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiler optimizations for Fortran D on MIMD distributed-memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Efficient and precise array access analysis
ACM Transactions on Programming Languages and Systems (TOPLAS)
High Performance Compilers for Parallel Computing
High Performance Compilers for Parallel Computing
Algorithms for Supporting Compiled Communication
IEEE Transactions on Parallel and Distributed Systems
the NESTOR Library: A Tool for Implementing FORTRAN Source Transformations
HPCN Europe '99 Proceedings of the 7th International Conference on High-Performance Computing and Networking
Automatic Parallelization by Pattern-Matching
Proceedings of the Second International ACPC Conference on Parallel Computation
CC--MPI: a compiled communication capable MPI prototype for ethernet switched clusters
Proceedings of the ninth ACM SIGPLAN symposium on Principles and practice of parallel programming
MPI: A Message-Passing Interface Standard
MPI: A Message-Passing Interface Standard
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
A speculative and adaptive MPI rendezvous protocol over RDMA-enabled interconnects
International Journal of Parallel Programming
Hi-index | 0.00 |
Applications that execute on parallel clusters face scalability concerns due to the high communication overhead that is usually associated with such environments. Modern network technologies that support Remote Direct Memory Access (RDMA) can offer true zero copy communication and reduce communication overhead by overlapping it with computation. For this approach to be effective the parallel application using the cluster must be structured in a way that enables communication computation overlapping. Unfortunately, the trade-off between maintainability and performance often leads to a structure that prevents exploiting the potential for communication computation overlapping. This paper describes a source-to-source optimizing transformation that can be performed by an automatic (or semi-automatic) system in order to restructure MPI codes towards maximizing communication-computation overlapping.