Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers

Authors:
Daniel J. Palermo;Ernesto Su;John A. Chandy;Prithviraj Banerjee
Affiliations:
University of Illinois at Urbana-Champaign, USA;University of Illinois at Urbana-Champaign, USA;University of Illinois at Urbana-Champaign, USA;University of Illinois at Urbana-Champaign, USA
Venue:
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
Year:
1994

Citing 0
Cited 8

A Loop Transformation Algorithm for Communication Overlapping

International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
Accurately Selecting Block Size at Runtime in Pipelined Parallel Programs

International Journal of Parallel Programming
A compilation method for communication—efficient partitioning of DOALL loops

Compiler optimizations for scalable parallel systems
Compiler optimization of dynamic data distributions for distributed-memory multicomputers

Compiler optimizations for scalable parallel systems
Fortran RED - A Retargetable Environment for Automatic Data Layout

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Three-dimensional orthogonal tile sizing problem: mathematical programming approach

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Object-Distribution Analysis for Program Decomposition and Re-Clustering

IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Workshop 13 - Volume 14
Optimizing the use of static buffers for DMA on a CELL chip

LCPC'06 Proceedings of the 19th international conference on Languages and compilers for parallel computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The PARADIGM (PARAllelizing compiler for DIstributed-memory General-purpose Multicomputers) project at the University of Illinois provides a fully automated means to parallelize programs, written in a serial programming model, for execution on distributed-memory multicomputers. To provide efficient execution, PARADIGM automatically performs various optimizations to reduce the overhead and idle time caused by interprocessor communication. Optimizations studied in this paper include message coalescing, message vectorization, message aggregation, and coarse gram pipelining. To separate the optimization algorithms from machine-specific details, parameterized models are used to estimate communication and computation costs for a given machine. The models are also used in coarse gram pipelining to automatically select a task granularity that balances the available parallelism with the costs of communication. To determine the applicability of the optimizations on different machines, we analyzed their performance on an Intel iPSC/860, an Intel iPSC/2, and a Thinking Machines CM-5.