An HPF compiler for the IBM SP2
Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Co-array Fortran for parallel programming
ACM SIGPLAN Fortran Forum
Communication Generation for Aligned and Cyclic(K) Distributions Using Integer Lattice
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Algorithm for Communication Overlapping
International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A communication placement framework with unified dependence and data-flow analysis
HIPC '96 Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)
Titanium Language Reference Manual
Titanium Language Reference Manual
Transformations to Parallel Codes for Communication-Computation Overlap
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Optimizing irregular shared-memory applications for distributed-memory systems
Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
OpenMP Extensions for Irregular Parallel Applications on Clusters
IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Optimizing bandwidth limited problems using one-sided communication and overlap
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
Message-passing is a predominant programming paradigm for distributed memory systems. RDMA networks like infiniBand and Myrinet reduce communication overhead by overlapping communication with computation. For the overlap to be more effective, we propose a source-to-source transformation scheme by automatically restructuring message-passing codes. The extensions to control-flow graph can accurately analyze the message-passing program and help perform data-flow analysis effectively. This analysis identifies the minimal region between producer and consumer, which contains message-passing functional calls. Using inter-procedural data-flow analysis, the transformation scheme enables the overlap of communication with computation. Experiments on the well-known NAS Parallel Benchmarks show that for distributed memory systems, versions employing communication-computation overlap are faster than original programs.