Compiler techniques for data partitioning of sequentially iterated parallel loops

Authors:
David E. Hudak;Santosh G. Abraham
Affiliations:
Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan;Department of Electrical Engineering and Computer Science, The University of Michigan, Ann Arbor, Michigan
Venue:
ICS '90 Proceedings of the 4th international conference on Supercomputing
Year:
1990

Citing 8
Cited 19

High-performance computer architecture

High-performance computer architecture
Stencils and problem partitionings: their influence on the performance of multiple processor systems

IEEE Transactions on Computers
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
On the problem of optimizing data transfers for complex memory systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Minimum Distance: A Method for Partitioning Recurrences for Multiprocessors

IEEE Transactions on Computers
The influence of parallel decomposition strategies on the performance of multiprocessor systems

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
On program restructuring, scheduling, and communication for parallel processor systems

On program restructuring, scheduling, and communication for parallel processor systems

Beyond loop partitioning: data assignment and overlap to reduce communication overhead

ICS '91 Proceedings of the 5th international conference on Supercomputing
Automatic transformation of FORTRAN loops to reduce cache conflicts

ICS '91 Proceedings of the 5th international conference on Supercomputing
Fortran at ten gigaflops: the connection machine convolution compiler

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Seismic modeling at 14 gigaflops on the connection machine

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Access normalization: loop restructuring for NUMA compilers

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Model-driven mapping onto distributed memory parallel computers

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Access normalization: loop restructuring for NUMA computers

ACM Transactions on Computer Systems (TOCS)
PARADIGM: a compiler for automatic data distribution on multicomputers

ICS '93 Proceedings of the 7th international conference on Supercomputing
Unifying data and control transformations for distributed shared-memory machines

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Automatic data layout for high performance Fortran

Supercomputing '95 Proceedings of the 1995 ACM/IEEE conference on Supercomputing
An Efficient Solution to the Cache Thrashing Problem Caused by True Data Sharing

IEEE Transactions on Computers
Automatic data layout for distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Compile-Time Techniques for Data Distribution in Distributed Memory Machines

IEEE Transactions on Parallel and Distributed Systems
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems
A General Data Layout for Distributed Consistency in Data Parallel Applications

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
An Adaptive Approach to Data Placement

IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Fortran RED - A Retargetable Environment for Automatic Data Layout

LCPC '98 Proceedings of the 11th International Workshop on Languages and Compilers for Parallel Computing
Automatic Data Layout Using 0-1 Integer Programming

PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper uses bottom-up, static program partitioning to minimize the execution time of parallel programs by reducing interprocessor communication. Program partitioning is applied to a parallel programming construct known as a sequentially iterated parallel loop. This paper develops and evaluates compiler techniques to automatically generate data partitions for sequentially iterated parallel loops that minimize interprocessor communication. These techniques could be included as a communication-reducing back end for existing parallelizing compilers.A multiprocessor model is defined to analyze program execution factors that lead to interprocessor communication. A model for the sequentially iterated parallel loop is defined to quantify communication patterns within a program. A vector notation is chosen to quantify communication across a global data set. Communication parameters used in generating data partitions are found and used within equations which are shown to generate rectangular and hexagonal partition shapes that minimize interprocessor communication.Experiments measuring the impact of communication on program performance and the improvements gained by using the automatically generated data partitions were conducted on a sixteen-processor Encore Multimax. The new method outperformed established data partitioning methods.