Global Partitioning of Parallel loops and Data Arrays for Caches and Distributed Memory in Multiprocessors

  • Authors:
  • R. K. Barua

  • Affiliations:
  • -

  • Venue:
  • Global Partitioning of Parallel loops and Data Arrays for Caches and Distributed Memory in Multiprocessors
  • Year:
  • 1990

Quantified Score

Hi-index 0.00

Visualization

Abstract

This thesis presents a solution to the problem of automatically partitioning loops and arrays for cache-coherent distributed memory multiprocessors. The compiler algorithm described is intended for such machines, though it handles machines without caches as well. A loop partition specifies the distribution of loop iterations across the processors. A data partition specifies the distribution of arrays. Loops are partitioned in order to get good cache reuse, while data partitioning endeavors to make most array references access the local memory of the processor issuing them. The problems of finding loop and data partitions are related, and must be done together. Our algorithm handles programs with multiple nested parallel loops accessing many arrays with array access indices being general affine functions of loop variables. We present a cost model which estimates the cost of a loop and data partition given machine parameters such as cache, local and remote access timings. Minimizing the cost as estimated by our model is an NP-complete problem, as is the fully general problem of partitioning. We present a heuristic method which provides solutions in polynomial time. The scheme has been fully implemented in our compiler for the Alewife machine. We demonstrate the method on several small program fragments, and show performance results on one large application, namely the conduct routine in SIMPLE, which has 20 parallel loops (including both one and two dimensional loops) and 20 data arrays, which are shared by several loops.