Concrete mathematics: a foundation for computer science
Concrete mathematics: a foundation for computer science
Compile-time generation of regular communications patterns
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Vienna Fortran—a Fortran language extension for distributed memory multiprocessors
Languages, compilers and run-time environments for distributed memory machines
The high performance Fortran handbook
The high performance Fortran handbook
Generating communication for array statements: design, implementation, and evaluation
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Compilation techniques for block-cyclic distributions
ICS '94 Proceedings of the 8th international conference on Supercomputing
An optimizing Fortran D compiler for MIMD distributed-memory machines
An optimizing Fortran D compiler for MIMD distributed-memory machines
Generating local addresses and communication sets for data-parallel programs
Journal of Parallel and Distributed Computing
A linear-time algorithm for computing the memory access sequence in data-parallel programs
PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Efficient address generation for block-cyclic distributions
ICS '95 Proceedings of the 9th international conference on Supercomputing
Compiling array expressions for efficient execution on distributed-memory machines
Journal of Parallel and Distributed Computing
Journal of Parallel and Distributed Computing - Special issue on compilation techniques for distributed memory systems
An Empirical Study of Fortran Programs for Parallelizing Compilers
IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
Code Generation for Complex Subscripts in Data-Parallel Programs
LCPC '97 Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing
Hi-index | 0.00 |
Address generation for compiling programs, written in HPF, to executable SPMD code is an important and necessary phase in a parallelizing compiler. This paper presents an efficient compilation technique to generate the local memory access sequences for block-cyclically distributed array references with affine subscripts in data-parallel programs. For the memory accesses of an array reference with affine subscript within a two-nested loop, there exist repetitive patterns both at the outer and inner loops. We use tables to record the memory accesses of repetitive patterns. According to these tables, a new start-computation algorithm is proposed to compute the starting elements on a processor for each outer loop iteration. The complexities of the table constructions are O(k+s2), where k is the distribution block size and s2 is the access stride for the inner loop. After tables are constructed, generating each starting element for each outer loop iteration can run in O(1) time. Moreover, we also show that the repetitive iterations for outer loop are Pk/gcd(Pk, s1), where P is the number of processors and s1 is the access stride for the outer loop. Therefore, the total complexity to generate the local memory access sequences for a block-cyclically distributed array with affine subscript in a two-nested loop is O(Pk/gcd(Pk, s1)+k+s2).