Efficient Address Generation for Affine Subscripts in Data-Parallel Programs

Authors:
Kuei-Ping Shih;Jang-Ping Sheu;Chih-Yung Chang
Affiliations:
Department of Computer Science and Information Engineering, Tamkang University, Tamsui, Taipei, Taiwankpshih@tkvr.tku.edu.tw;Department of Computer Science and Information Engineering, National Central University, Chung-Li 32054, Taiwan sheujp@csie.ncu.edu.tw;Department of Computer and Information Science, Aletheia University, Tamsui, Taipei, Taiwan changcy@email.au.edu.tw
Venue:
The Journal of Supercomputing
Year:
2000

Citing 15
Cited 0

Concrete mathematics: a foundation for computer science

Concrete mathematics: a foundation for computer science
Compile-time generation of regular communications patterns

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Vienna Fortran—a Fortran language extension for distributed memory multiprocessors

Languages, compilers and run-time environments for distributed memory machines
The high performance Fortran handbook

The high performance Fortran handbook
Generating communication for array statements: design, implementation, and evaluation

Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Compilation techniques for block-cyclic distributions

ICS '94 Proceedings of the 8th international conference on Supercomputing
An optimizing Fortran D compiler for MIMD distributed-memory machines

An optimizing Fortran D compiler for MIMD distributed-memory machines
Generating local addresses and communication sets for data-parallel programs

Journal of Parallel and Distributed Computing
A linear-time algorithm for computing the memory access sequence in data-parallel programs

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient address generation for block-cyclic distributions

ICS '95 Proceedings of the 9th international conference on Supercomputing
Compiling array expressions for efficient execution on distributed-memory machines

Journal of Parallel and Distributed Computing
Efficient computation of address sequences in data parallel programs using closed forms for basis vectors

Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
An Empirical Study of Fortran Programs for Parallelizing Compilers

IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
Code Generation for Complex Subscripts in Data-Parallel Programs

LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Address generation for compiling programs, written in HPF, to executable SPMD code is an important and necessary phase in a parallelizing compiler. This paper presents an efficient compilation technique to generate the local memory access sequences for block-cyclically distributed array references with affine subscripts in data-parallel programs. For the memory accesses of an array reference with affine subscript within a two-nested loop, there exist repetitive patterns both at the outer and inner loops. We use tables to record the memory accesses of repetitive patterns. According to these tables, a new start-computation algorithm is proposed to compute the starting elements on a processor for each outer loop iteration. The complexities of the table constructions are O(k+s2), where k is the distribution block size and s2 is the access stride for the inner loop. After tables are constructed, generating each starting element for each outer loop iteration can run in O(1) time. Moreover, we also show that the repetitive iterations for outer loop are Pk/gcd(Pk, s1), where P is the number of processors and s1 is the access stride for the outer loop. Therefore, the total complexity to generate the local memory access sequences for a block-cyclically distributed array with affine subscript in a two-nested loop is O(Pk/gcd(Pk, s1)+k+s2).