Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations

  • Authors:
  • Alain Darte;John Mellor-Crummey;Robert Fowler;Daniel Chavarría-Miranda

  • Affiliations:
  • LIP, ENS-Lyon, 46, Allée d'Italie, 69007 Lyon, France;Department of Computer Science MS-132, Rice University, 6100 Main, Houston, TX;Department of Computer Science MS-132, Rice University, 6100 Main, Houston, TX;Department of Computer Science MS-132, Rice University, 6100 Main, Houston, TX

  • Venue:
  • Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Multipartitioning is a strategy for decomposing multi-dimensional arrays into tiles and mapping the resulting tiles onto a collection of processors. This class of partitionings enables efficient parallelization of "line-sweep" computations that solve one-dimensional recurrences along each dimension of a multi-dimensional array. Multipartitionings yield balanced parallelism for line sweeps by assigning each processor the same number of data tiles to compute at each step of a sweep along any array dimension. Also, they induce only coarse-grain communication.This paper considers the problem of computing generalized multipartitionings, which decompose d-dimensional arrays, d ≥ 2, onto an arbitrary number of processors. We describe an algorithm that computes an optimal multipartitioning onto all of the processors for this general case. We use a cost model to select the dimensionality of the best partitioning and the number of cuts to make along each array dimension; then, we show how to construct a mapping that assigns the resulting data tiles to each of the processors. The assignment of tiles to processors induced by this class of multipartitionings corresponds to an instance of a latin hyper-rectangle, a natural extension of latin squares, which have been widely studied in mathematics and statistics.Finally, we describe how we extended the Rice dHPF compiler for High Performance Fortran to generate code that employs our strategy for generalized multipartitioning and show that the compiler's generated code for the NAS SP computational fluid dynamics benchmark achieves scalable high performance.