Generalized multipartitioning of multi-dimensional arrays for parallelizing line-sweep computations

Authors:
Alain Darte;John Mellor-Crummey;Robert Fowler;Daniel Chavarría-Miranda
Affiliations:
LIP, ENS-Lyon, 46, Allée d'Italie, 69007 Lyon, France;Department of Computer Science MS-132, Rice University, 6100 Main, Houston, TX;Department of Computer Science MS-132, Rice University, 6100 Main, Houston, TX;Department of Computer Science MS-132, Rice University, 6100 Main, Houston, TX
Venue:
Journal of Parallel and Distributed Computing - Special section best papers from the 2002 international parallel and distributed processing symposium
Year:
2003

Citing 9
Cited 11

Alternating direction methods on multiprocessors

SIAM Journal on Scientific and Statistical Computing
Implementing the beam and warming method on the hypercube

C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Regular partitioning for synthesizing fixed-size systolic arrays

Integration, the VLSI Journal
Efficient implementation of a 3-dimensional ADI method on the iPSC/860

Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Implementing the 3D alternating direction method on the hypercube

Journal of Parallel and Distributed Computing
Using integer sets for data-parallel program analysis and optimization

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
High performance Fortran compilation techniques for parallelizing scientific codes

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Computers and Intractability: A Guide to the Theory of NP-Completeness

Computers and Intractability: A Guide to the Theory of NP-Completeness
A Constructive Solution to the Juggling Problem in Processor Array Synthesis

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing

Effective communication coalescing for data-parallel applications

Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
The Effect of Process Topology and Load Balancing on Parallel Programming Models for SMP Clusters and Iterative Algorithms

The Journal of Supercomputing
Topology-aware tile mapping for clusters of SMPs

Proceedings of the 3rd conference on Computing frontiers
An Approach To Data Distributions in Chapel

International Journal of High Performance Computing Applications
From FORTRAN 77 to locality-aware high productivity languages for peta-scale computing

Scientific Programming - Fortran Programming Language and Scientific Programming: 50 Years of Mutual Growth
Efficient hybrid parallelisation of tiled algorithms on SMP clusters

International Journal of Computational Science and Engineering
User-defined distributions and layouts in chapel: philosophy and framework

HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Tolerating correlated failures for generalized Cartesian distributions via bipartite matching

Proceedings of the 8th ACM International Conference on Computing Frontiers
Matching memory access patterns and data placement for NUMA systems

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Work stealing and persistence-based load balancers for iterative overdecomposed applications

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Compiling affine loop nests for distributed-memory parallel architectures

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Multipartitioning is a strategy for decomposing multi-dimensional arrays into tiles and mapping the resulting tiles onto a collection of processors. This class of partitionings enables efficient parallelization of "line-sweep" computations that solve one-dimensional recurrences along each dimension of a multi-dimensional array. Multipartitionings yield balanced parallelism for line sweeps by assigning each processor the same number of data tiles to compute at each step of a sweep along any array dimension. Also, they induce only coarse-grain communication.This paper considers the problem of computing generalized multipartitionings, which decompose d-dimensional arrays, d ≥ 2, onto an arbitrary number of processors. We describe an algorithm that computes an optimal multipartitioning onto all of the processors for this general case. We use a cost model to select the dimensionality of the best partitioning and the number of cuts to make along each array dimension; then, we show how to construct a mapping that assigns the resulting data tiles to each of the processors. The assignment of tiles to processors induced by this class of multipartitionings corresponds to an instance of a latin hyper-rectangle, a natural extension of latin squares, which have been widely studied in mathematics and statistics.Finally, we describe how we extended the Rice dHPF compiler for High Performance Fortran to generate code that employs our strategy for generalized multipartitioning and show that the compiler's generated code for the NAS SP computational fluid dynamics benchmark achieves scalable high performance.