Reducing data communication overhead for DOACROSS loop nests

Authors:
Peiyi Tang;John N. Zigman
Affiliations:
Department of Computer Science, The Australian National University, Canberra ACT 2601, Australia;Department of Computer Science, The Australian National University, Canberra ACT 2601, Australia
Venue:
ICS '94 Proceedings of the 8th international conference on Supercomputing
Year:
1994

Citing 8
Cited 12

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Array expansion

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors

IEEE Transactions on Computers
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
Scanning polyhedra with DO loops

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
The parallel execution of DO loops

Communications of the ACM
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations

IEEE Transactions on Parallel and Distributed Systems

Controlling application grain size on a network of workstations

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Optimal tile size adjustment in compiling general DOACROSS loop nests

ICS '95 Proceedings of the 9th international conference on Supercomputing
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs

International Journal of Parallel Programming
Chain Grouping: A Method for Partitioning Loops onto Mesh-Connected Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
Compile Time Barrier Synchronization Minimization

IEEE Transactions on Parallel and Distributed Systems
Optimal task scheduling at run time to exploit intra-tile parallelism

Parallel Computing
The Effect of Process Topology and Load Balancing on Parallel Programming Models for SMP Clusters and Iterative Algorithms

The Journal of Supercomputing
Efficient hybrid parallelisation of tiled algorithms on SMP clusters

International Journal of Computational Science and Engineering
Selecting the tile shape to reduce the total communication volume

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Resource-sensitive synchronization inference by abduction

POPL '12 Proceedings of the 39th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proof-Directed Parallelization Synthesis by Separation Logic

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compiling affine loop nests for distributed-memory parallel architectures

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

If the iterations of a loop nest cannot be partitioned into independent tasks, data communication for data dependence is inevitable in order to execute them on parallel machines. This kind of loop nest is referred to as a DOACROSS loop nest.This paper is concerned with compiler algorithms for parallelizing DOACROSS loop nests for distributed-memory multicomputers. We present a method that combines loop tiling, chain-based scheduling and indirect message passing to generate efficient message-passing parallel code. We present our experiment results on the Fujitsu AP1000 to show that low communication overhead and high speedup for DOACROSS loop nests on multicomputers can be achieved by tuning these techniques.