Automatic Transformation for Overlapping Communication and Computation

Authors:
Changjun Hu;Yewei Shao;Jue Wang;Jianjiang Li
Affiliations:
School of Information Engineering, University of Science and Technology Beijing, Beijing, P.R.China;School of Information Engineering, University of Science and Technology Beijing, Beijing, P.R.China;School of Information Engineering, University of Science and Technology Beijing, Beijing, P.R.China;School of Information Engineering, University of Science and Technology Beijing, Beijing, P.R.China
Venue:
NPC '08 Proceedings of the IFIP International Conference on Network and Parallel Computing
Year:
2008

Citing 11
Cited 0

An HPF compiler for the IBM SP2

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Co-array Fortran for parallel programming

ACM SIGPLAN Fortran Forum
Communication Generation for Aligned and Cyclic(K) Distributions Using Integer Lattice

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Algorithm for Communication Overlapping

International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
A communication placement framework with unified dependence and data-flow analysis

HIPC '96 Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)
Titanium Language Reference Manual

Titanium Language Reference Manual
Transformations to Parallel Codes for Communication-Computation Overlap

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Optimizing irregular shared-memory applications for distributed-memory systems

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
OpenMP Extensions for Irregular Parallel Applications on Clusters

IWOMP '07 Proceedings of the 3rd international workshop on OpenMP: A Practical Programming Model for the Multi-Core Era
Optimizing bandwidth limited problems using one-sided communication and overlap

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Message-passing is a predominant programming paradigm for distributed memory systems. RDMA networks like infiniBand and Myrinet reduce communication overhead by overlapping communication with computation. For the overlap to be more effective, we propose a source-to-source transformation scheme by automatically restructuring message-passing codes. The extensions to control-flow graph can accurately analyze the message-passing program and help perform data-flow analysis effectively. This analysis identifies the minimal region between producer and consumer, which contains message-passing functional calls. Using inter-procedural data-flow analysis, the transformation scheme enables the overlap of communication with computation. Experiments on the well-known NAS Parallel Benchmarks show that for distributed memory systems, versions employing communication-computation overlap are faster than original programs.