Automatic data and computation decomposition on distributed memory parallel computers

Authors:
Peizong Lee;Zvi Meir Kedem
Affiliations:
Academia Sinica, Taipei, Taiwan, Republic of China;New York University, New York, NY
Venue:
ACM Transactions on Programming Languages and Systems (TOPLAS)
Year:
2002

Citing 55
Cited 4

Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays

IEEE Transactions on Computers
Automatic translation of FORTRAN programs to vector form

ACM Transactions on Programming Languages and Systems (TOPLAS)
The derivation of systolic implementations

Acta Informatica
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems

Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design

IEEE Transactions on Computers - Special issue on architectural support for programming languages and operating systems
Synthesizing Linear Array Algorithms from Nested FOR Loop Algorithms

IEEE Transactions on Computers
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Supercompilers for parallel and vector computers

Supercompilers for parallel and vector computers
On high-speed computing with a programmable linear array

The Journal of Supercomputing
Generating explicit communication from shared-memory program references

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
The data alignment phase in compiling programs for distributed-memory machines

Journal of Parallel and Distributed Computing
Compiling Fortran D for MIMD distributed-memory machines

Communications of the ACM
A methodology for high-level synthesis of communication on multicomputers

ICS '92 Proceedings of the 6th international conference on Supercomputing
Automatic array alignment in data-parallel programs

POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
The high performance Fortran handbook

The high performance Fortran handbook
Communication-free hyperplane partitioning of nested loops

Journal of Parallel and Distributed Computing
(Pen)-ultimate tiling?

Integration, the VLSI Journal
Techniques for compiling programs on distributed memory multicomputers

Parallel Computing
Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Automatic data layout for distributed memory machines

Automatic data layout for distributed memory machines
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Communication-minimal tiling of uniform dependence loops

Journal of Parallel and Distributed Computing
Optimization and transformation techniques for high performance Fortran

Optimization and transformation techniques for high performance Fortran
Maximizing parallelism and minimizing synchronization with affine partitions

Parallel Computing - Special issues on languages and compilers for parallel computers
Automatic data layout for distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
An affine partitioning algorithm to maximize parallelism and minimize communication

ICS '99 Proceedings of the 13th international conference on Supercomputing
Selecting tile shape for minimal execution time

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
High performance Fortran compilation techniques for parallelizing scientific codes

SC '98 Proceedings of the 1998 ACM/IEEE conference on Supercomputing
A Systolic Array Parallelizing Compiler

A Systolic Array Parallelizing Compiler
Advanced Computer Architecture: Parallelism,Scalability,Programmability

Advanced Computer Architecture: Parallelism,Scalability,Programmability
High Performance Compilers for Parallel Computing

High Performance Compilers for Parallel Computing
The Generation of a Class of Multipliers: Synthesizing Highly Parallel Algorithms in VLSI

IEEE Transactions on Computers
Distributed Memory Compiler Design For Sparse Problems

IEEE Transactions on Computers
Mapping Nested Loop Algorithms into Multidimensional Systolic Arrays

IEEE Transactions on Parallel and Distributed Systems
Compiling Communication-Efficient Programs for Massively Parallel Machines

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines

IEEE Transactions on Parallel and Distributed Systems
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
On Time Mapping of Uniform Dependence Algorithms into Lower Dimensional Processor Arrays

IEEE Transactions on Parallel and Distributed Systems
Compiling for Distributed Memory Architectures

IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
On Supernode Transformation with Minimized Total Running Time

IEEE Transactions on Parallel and Distributed Systems
On Privatization of Variables for Data-Parallel Execution

IPPS '97 Proceedings of the 11th International Symposium on Parallel Processing
Communication-Minimal Partitioning of Parallel Loops and Data Arrays for Cache-Coherent Distributed-Memory Multiprocessors

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers

LCPC '96 Proceedings of the 9th International Workshop on Languages and Compilers for Parallel Computing
Array Distribution in Data-Parallel Programs

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Communication-Free Parallelization via Affine Transformations

LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Data Relation Vectors: A New Abstraction for Data Optimizations

PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
A systolic array optimizing compiler

A systolic array optimizing compiler
Optimal communication primitives and graph embeddings on hypercubes

Optimal communication primitives and graph embeddings on hypercubes
Automatic generation of systolic programs from nested loops

Automatic generation of systolic programs from nested loops
Automatic data and computation mapping for distributed-memory machines

Automatic data and computation mapping for distributed-memory machines
Compiler techniques for optimizing communication and data distribution for distributed-memory multicomputers

Compiler techniques for optimizing communication and data distribution for distributed-memory multicomputers
Automatic computation and data decomposition for multiprocessors

Automatic computation and data decomposition for multiprocessors

Research note: Modeling distributed data representation and its effect on parallel data accesses

Journal of Parallel and Distributed Computing - Special issue: Design and performance of networks for super-, cluster-, and grid-computing: Part I
Choosing colors for geometric graphs via color space embeddings

GD'06 Proceedings of the 14th international conference on Graph drawing
Inferring arbitrary distributions for data and computation

Proceedings of the ACM international conference companion on Object oriented programming systems languages and applications companion
Partitioning applications for hybrid and federated clouds

CASCON '12 Proceedings of the 2012 Conference of the Center for Advanced Studies on Collaborative Research

Quantified Score

Hi-index	0.00

Visualization

Abstract

To exploit parallelism on shared memory parallel computers (SMPCs), it is natural to focus on decomposing the computation (mainly by distributing the iterations of the nested Do-Loops). In contrast, on distributed memory parallel computers (DMPCs), the decomposition of computation and the distribution of data must both be handled---in order to balance the computation load and to minimize the migration of data. We propose and validate experimentally a method for handling computations and data synergistically to minimize the overall execution time on DMPCs. The method is based on a number of novel techniques, also presented in this article. The core idea is to rank the "importance" of data arrays in a program and specify some of the dominant. The intuition is that the dominant arrays are the ones whose migration would be the most expensive. Using the correspondence between iteration space mapping vectors and distributed dimensions of the dominant data array in each nested Do-loop, allows us to design algorithms for determining data and computation decompositions at the same time. Based on data distribution, computation decomposition for each nested Do-loop is determined based on either the "owner computes" rule or the "owner stores" rule with respect to the dominant data array. If all temporal dependence relations across iteration partitions are regular, we use tiling to allow pipelining and the overlapping of computation and communication. However, in order to use tiling on DMPCs, we needed to extend the existing techniques for determining tiling vectors and tile sizes, as they were originally suited for SMPCs only. The overall method is illustrated on programs for the 2D heat equation, for the Gaussian elimination with pivoting, and for the 2D fast Fourier transform on a linear processor array and on a 2D processor grid.