Alternating direction methods on multiprocessors
SIAM Journal on Scientific and Statistical Computing
Implementing the beam and warming method on the hypercube
C3P Proceedings of the third conference on Hypercube concurrent computers and applications - Volume 2
Regular partitioning for synthesizing fixed-size systolic arrays
Integration, the VLSI Journal
Efficient implementation of a 3-dimensional ADI method on the iPSC/860
Proceedings of the 1993 ACM/IEEE conference on Supercomputing
Implementing the 3D alternating direction method on the hypercube
Journal of Parallel and Distributed Computing
Using integer sets for data-parallel program analysis and optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
High performance Fortran compilation techniques for parallelizing scientific codes
SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
A Constructive Solution to the Juggling Problem in Processor Array Synthesis
IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Effective communication coalescing for data-parallel applications
Proceedings of the tenth ACM SIGPLAN symposium on Principles and practice of parallel programming
Topology-aware tile mapping for clusters of SMPs
Proceedings of the 3rd conference on Computing frontiers
An Approach To Data Distributions in Chapel
International Journal of High Performance Computing Applications
From FORTRAN 77 to locality-aware high productivity languages for peta-scale computing
Scientific Programming - Fortran Programming Language and Scientific Programming: 50 Years of Mutual Growth
Efficient hybrid parallelisation of tiled algorithms on SMP clusters
International Journal of Computational Science and Engineering
User-defined distributions and layouts in chapel: philosophy and framework
HotPar'10 Proceedings of the 2nd USENIX conference on Hot topics in parallelism
Tolerating correlated failures for generalized Cartesian distributions via bipartite matching
Proceedings of the 8th ACM International Conference on Computing Frontiers
Matching memory access patterns and data placement for NUMA systems
Proceedings of the Tenth International Symposium on Code Generation and Optimization
Work stealing and persistence-based load balancers for iterative overdecomposed applications
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Compiling affine loop nests for distributed-memory parallel architectures
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Multipartitioning is a strategy for decomposing multi-dimensional arrays into tiles and mapping the resulting tiles onto a collection of processors. This class of partitionings enables efficient parallelization of "line-sweep" computations that solve one-dimensional recurrences along each dimension of a multi-dimensional array. Multipartitionings yield balanced parallelism for line sweeps by assigning each processor the same number of data tiles to compute at each step of a sweep along any array dimension. Also, they induce only coarse-grain communication.This paper considers the problem of computing generalized multipartitionings, which decompose d-dimensional arrays, d ≥ 2, onto an arbitrary number of processors. We describe an algorithm that computes an optimal multipartitioning onto all of the processors for this general case. We use a cost model to select the dimensionality of the best partitioning and the number of cuts to make along each array dimension; then, we show how to construct a mapping that assigns the resulting data tiles to each of the processors. The assignment of tiles to processors induced by this class of multipartitionings corresponds to an instance of a latin hyper-rectangle, a natural extension of latin squares, which have been widely studied in mathematics and statistics.Finally, we describe how we extended the Rice dHPF compiler for High Performance Fortran to generate code that employs our strategy for generalized multipartitioning and show that the compiler's generated code for the NAS SP computational fluid dynamics benchmark achieves scalable high performance.