A Loop Transformation Algorithm for Communication Overlapping

Authors:
Kazuaki Ishizaki;Hideaki Komatsu;Toshio Nakatani
Affiliations:
1623-14, Shimotsuruma, Yamato-shi, Kanagawa-ken 242-8502, Japan;1623-14, Shimotsuruma, Yamato-shi, Kanagawa-ken 242-8502, Japan;1623-14, Shimotsuruma, Yamato-shi, Kanagawa-ken 242-8502, Japan
Venue:
International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
Year:
2000

Citing 25
Cited 6

Process decomposition through locality of reference

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Supporting shared data structures on distributed memory architectures

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Improving AP1000 parallel computer performance with message communication

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Parallel programming in Split-C

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Preliminary experiences with the Fortran D compiler

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Fortran 90D/HPF compiler for distributed memory MIMD computers: design, implementation, and performance results

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
GIVE-N-TAKE—a balanced code placement framework

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Techniques to overlap computation and communication in irregular iterative applications

ICS '94 Proceedings of the 8th international conference on Supercomputing
An optimizing Fortran D compiler for MIMD distributed-memory machines

An optimizing Fortran D compiler for MIMD distributed-memory machines
SP2 system architecture

IBM Systems Journal
The communication software and parallel environment of the IBM SP2

IBM Systems Journal
Automatic alignment of array data and processes to reduce communication time on DMPPs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
HPF compiler for the AP1000

ICS '95 Proceedings of the 9th international conference on Supercomputing
Optimal tile size adjustment in compiling general DOACROSS loop nests

ICS '95 Proceedings of the 9th international conference on Supercomputing
Detection and global optimization of reduction operations for distributed parallel machines

ICS '96 Proceedings of the 10th international conference on Supercomputing
Maximizing parallelism and minimizing synchronization with affine transforms

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
A Loop Parallelization Algorithm for HPF Compilers

LCPC '95 Proceedings of the 8th International Workshop on Languages and Compilers for Parallel Computing
The SUIF Compiler System: a Parallelizing and Optimizing Research Compiler

The SUIF Compiler System: a Parallelizing and Optimizing Research Compiler
Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers

ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02

Out-of-Core and Pipeline Techniques for Wavefront Algorithms

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Optimizing irregular shared-memory applications for distributed-memory systems

Proceedings of the eleventh ACM SIGPLAN symposium on Principles and practice of parallel programming
Orchestrating data transfer for the cell/B.E. processor

Proceedings of the 22nd annual international conference on Supercomputing
Automatic Transformation for Overlapping Communication and Computation

NPC '08 Proceedings of the IFIP International Conference on Network and Parallel Computing
DBDB: optimizing DMATransfer for the cell be architecture

Proceedings of the 23rd international conference on Supercomputing
Optimizing the use of static buffers for DMA on a CELL chip

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Overlapping communication with computation is a well-known approach to improving performance. Previous research has focused on optimizations performed by the programmer. This paper presents a compiler algorithm that automatically determines the appropriate loop indices of a given nested loop and applies loop interchange and tiling in order to overlap communication with computation. The algorithm avoids generating redundant communication by providing a framework for combining information on data dependence, communication, and reuse. It also describes a method of generating messages to exchange data between processors for tiled loops on distributed memory machines. The algorithm has been implemented in our High Performance Fortran (HPF) compiler, and experimental results have shown its effectiveness on distributed memory machines, such as the RISC System/6000 Scalable POWERparallel System. This paper also discusses the architectural problems of efficient optimization.