Compiling for Distributed Memory Architectures

Authors:
A. Rogers;K. Pingali
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1994

Citing 19
Cited 13

Para-Functional Programming

Computer
Programming for Parallelism

Computer
An overview for the PTRAN analysis system for multiprocessing

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
An efficient method of computing static single assignment form

POPL '89 Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
A parallelizing compiler for distributed memory parallel computers

A parallelizing compiler for distributed memory parallel computers
Run-time scheduling and execution of loops on message passing machines

Journal of Parallel and Distributed Computing - Special issue: algorithms for hypercube computers
Updating distributed variables in local computations

Concurrency: Practice and Experience
Compiling programs for a linear systolic array

PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Supporting shared data structures on distributed memory architectures

PPOPP '90 Proceedings of the second ACM SIGPLAN symposium on Principles & practice of parallel programming
Compiler optimizations for Fortran D on MIMD distributed-memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Compiling programs for nonshared memory machines

Compiling programs for nonshared memory machines
Pandore: a system to manage data distribution

ICS '90 Proceedings of the 4th international conference on Supercomputing
Data-Parallel Programming on Multicomputers

IEEE Software
Data-Parallel Programming on MIMD Computers

IEEE Transactions on Parallel and Distributed Systems
Programming SIMPLE for Parallel Portability

Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing
Compiler Parallelization of SIMPLE for a Distributed Memory Machine

Compiler Parallelization of SIMPLE for a Distributed Memory Machine
A systolic array optimizing compiler

A systolic array optimizing compiler
Compiling for locality of reference

Compiling for locality of reference
Compile time techniques for parallel execution of loops on distributed memory multiprocessors

Compile time techniques for parallel execution of loops on distributed memory multiprocessors

Optimizing parallel programs with explicit synchronization

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Index array flattening through program transformation

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Optimal tile size adjustment in compiling general DOACROSS loop nests

ICS '95 Proceedings of the 9th international conference on Supercomputing
A Space-Time Representation Method of Iterative Algorithms for the Design of Processor Arrays

Journal of VLSI Signal Processing Systems
High performance Fortran 2.0

Compiler optimizations for scalable parallel systems
Runtime and compiler support for irregular computations

Compiler optimizations for scalable parallel systems
Automatic data and computation decomposition on distributed memory parallel computers

ACM Transactions on Programming Languages and Systems (TOPLAS)
An Integrated Runtime and Compile-Time Approach for Parallelizing Structured and Block Structured Applications

IEEE Transactions on Parallel and Distributed Systems
Mobile Agents - The Right Vehicle for Distributed Sequential Computing

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Effective communication coalescing for data-parallel applications

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Using data replication to reduce communication energy on chip multiprocessors

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
2D data locality: definition, abstraction, and application

ICCAD '05 Proceedings of the 2005 IEEE/ACM International conference on Computer-aided design
Transparent runtime parallelization of the R scripting language

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The lack of high-level languages and good compilers for parallel machines hinders their widespread acceptance and use. Programmers must address issues such as process decomposition, synchronization, and load balancing. We have developed a parallelizing compiler that, given a sequential program and a memory layout of its data, performs process decomposition while balancing parallelism against locality of reference. A process decomposition is obtained by specializing the program for each processor to the data that resides on that processor. If this analysis fails, the compiler falls back to a simple but inefficient scheme called run-time resolution. Each process's role in the computation is determined by examining the data required for execution at run-time. Thus, our approach to process decomposition is data-driven rather than program-driven. We discuss several message optimizations that address the issues of overhead and synchronization in message transmission. Accumulation reorganizes the computation of a commutative and associative operator to reduce message traffic. Pipelining sends a value as close to its computation as possible to increase parallelism. Vectorization of messages combines messages with the same source and the same destination to reduce overhead. Our results from experiments in parallelizing SIMPLE, a large hydrodynamics benchmark, for the Intel iPSC/2, show a speedup within 60% to 70% of handwritten code.