Efficient Address Generation for Affine Subscripts in Data-Parallel Programs

  • Authors:
  • Kuei-Ping Shih;Jang-Ping Sheu;Chih-Yung Chang

  • Affiliations:
  • Department of Computer Science and Information Engineering, Tamkang University, Tamsui, Taipei, Taiwankpshih@tkvr.tku.edu.tw;Department of Computer Science and Information Engineering, National Central University, Chung-Li 32054, Taiwan sheujp@csie.ncu.edu.tw;Department of Computer and Information Science, Aletheia University, Tamsui, Taipei, Taiwan changcy@email.au.edu.tw

  • Venue:
  • The Journal of Supercomputing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Address generation for compiling programs, written in HPF, to executable SPMD code is an important and necessary phase in a parallelizing compiler. This paper presents an efficient compilation technique to generate the local memory access sequences for block-cyclically distributed array references with affine subscripts in data-parallel programs. For the memory accesses of an array reference with affine subscript within a two-nested loop, there exist repetitive patterns both at the outer and inner loops. We use tables to record the memory accesses of repetitive patterns. According to these tables, a new start-computation algorithm is proposed to compute the starting elements on a processor for each outer loop iteration. The complexities of the table constructions are O(k+s2), where k is the distribution block size and s2 is the access stride for the inner loop. After tables are constructed, generating each starting element for each outer loop iteration can run in O(1) time. Moreover, we also show that the repetitive iterations for outer loop are Pk/gcd(Pk, s1), where P is the number of processors and s1 is the access stride for the outer loop. Therefore, the total complexity to generate the local memory access sequences for a block-cyclically distributed array with affine subscript in a two-nested loop is O(Pk/gcd(Pk, s1)+k+s2).