Automatic alignment of array data and processes to reduce communication time on DMPPs

Authors:
Michael Philippsen
Affiliations:
ICSI, International Computer Science Institute, Berkeley, CA and Dept. of Informatics, University of Karlsruhe
Venue:
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Year:
1995

Citing 15
Cited 8

Memory storage patterns in parallel processing

Memory storage patterns in parallel processing
Data optimization: allocation of arrays to reduce communication on SIMD machines

Journal of Parallel and Distributed Computing - Massively parallel computation
A static performance estimator to guide data partitioning decisions

PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Optimal expression evaluation for data parallel architectures

Journal of Parallel and Distributed Computing
The data alignment phase in compiling programs for distributed-memory machines

Journal of Parallel and Distributed Computing
Automatic data mapping for distributed-memory parallel computers

Automatic data mapping for distributed-memory parallel computers
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
Automatic performance prediction to support parallelization of Fortran programs for massively parallel systems

ICS '92 Proceedings of the 6th international conference on Supercomputing
Automatic array alignment in data-parallel programs

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Compiling machine-independent parallel programs

ACM SIGPLAN Notices
Automatic data partitioning on distributed memory multicomputers

Automatic data partitioning on distributed memory multicomputers
Algorithms for Generating Fundamental Cycles in a Graph

ACM Transactions on Mathematical Software (TOMS)
Comparative Study of Parallel Programming Languages: The Salishan Problems

Comparative Study of Parallel Programming Languages: The Salishan Problems
The Alignment-Distribution Graph

Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Aligning parallel arrays to reduce communication

FRONTIERS '95 Proceedings of the Fifth Symposium on the Frontiers of Massively Parallel Computation (Frontiers'95)

Automatic data layout for high performance Fortran

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
Automatic data layout for distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Exploiting Domain-Specific Properties: Compiling Parallel Dynamic Neural Network Algorithms into Efficient Code

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Algorithm for Communication Overlapping

International Journal of Parallel Programming - Special issue on international symposium on high performance computing 1997, part I
Efficient Parallel Execution of Irregular Recursive Programs

IEEE Transactions on Parallel and Distributed Systems
Parallel Sparse Supports for Array Intrinsic Functions of Fortran 90

The Journal of Supercomputing
Segmented Alignment: An Enhanced Model to Align Data Parallel Programs of HPF

The Journal of Supercomputing
Support and optimization for parallel sparse programs with array intrinsics of Fortran 90

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the problem of aligning array data and processes in a distributed-memory implementation. We present complete algorithms for compile-time analysis, the necessary program restructuring, and subsequent code-generation, and discuss their complexity. We finally evaluate the practical usefulness by quantitative experiments.The technique presented analyzes complete programs, including branches, loops, and nested parallelism. Alignment is determined with respect to offset, stride, and general axis relations. Placement of both data and processes are computed in a unifying framework based on an extended preference graph and its analysis. Dynamic redistributions are derived.The experimental results are very encouraging. The optimization algorithms implemented in our Modula-2* compiler improved the execution times of the programs by an average over 40% on a MasPar MP-1 with 16384 processors.