Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers

Authors:
T. S. Chen;J. P. Sheu
Affiliations:
-;-
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1994

Citing 18
Cited 17

Advanced compiler optimizations for supercomputers

Communications of the ACM - Special issue on parallelism
Strategies for cache and local memory management by global program transformation

Journal of Parallel and Distributed Computing - Special Issue on Languages, Compilers and environments for Parallel Programming
On the problem of optimizing data transfers for complex memory systems

ICS '88 Proceedings of the 2nd international conference on Supercomputing
Semi-automatic process partitioning for parallel computation

International Journal of Parallel Programming
Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Process decomposition through locality of reference

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
A methodology for parallelizing programs for multicomputers and complex memory multiprocessors

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Compiler techniques for data partitioning of sequentially iterated parallel loops

ICS '90 Proceedings of the 4th international conference on Supercomputing
The parallel execution of DO loops

Communications of the ACM
Optimizing Supercompilers for Supercomputers

Optimizing Supercompilers for Supercomputers
Pipelined Data Parallel Algorithms-II: Design

IEEE Transactions on Parallel and Distributed Systems
Partitioning and Mapping Nested Loops on Multiprocessor Systems

IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution

IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines

IEEE Transactions on Parallel and Distributed Systems
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers

IEEE Transactions on Parallel and Distributed Systems

Data-localization for Fortran macro-dataflow computation using partial static task assignment

ICS '96 Proceedings of the 10th international conference on Supercomputing
Optimal Data Scheduling for Uniform Multidimensional Applications

IEEE Transactions on Computers
Efficient Algorithms for Data Distribution on Distributed Memory Parallel Computers

IEEE Transactions on Parallel and Distributed Systems
Statement-Level Communication-Free Partitioning Techniques for Parallelizing Compilers

The Journal of Supercomputing
Data and memory optimization techniques for embedded systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Communication-free partitioning of nested loops

Compiler optimizations for scalable parallel systems
A compilation method for communication—efficient partitioning of DOALL loops

Compiler optimizations for scalable parallel systems
Automatic data and computation decomposition on distributed memory parallel computers

ACM Transactions on Programming Languages and Systems (TOPLAS)
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

The Journal of Supercomputing
Code Transformations for Data Transfer and Storage Exploration Preprocessing in Multimedia Processors

IEEE Design & Test
Skewed Data Partition and Alignment Techniques for Compiling Programs on Distributed Memory Multicomputers

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Interaction between parallel compilation and data transfer and storage cost minimization for multimedia applications

Practical parallel computing
Optimization of Data Distribution and Processor Allocation Problem Using Simulated Annealing

The Journal of Supercomputing
Linear data distribution based on index analysis

High performance scientific and engineering computing
Memetic algorithms for parallel code optimization

International Journal of Parallel Programming
Optimizing shared cache behavior of chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Mapping workflow applications with types on heterogeneous specialized platforms

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In distributed memory multicomputers, local memory accesses are much faster than thoseinvolving interprocessor communication. For the sake of reducing or even eliminating theinterprocessor communication, the array elements in programs must be carefullydistributed to local memory of processors for parallel execution. We devote our efforts tothe techniques of allocating array elements of nested loops onto multicomputers in acommunication-free fashion for parallelizing compilers. We first analyze the pattern ofreferences among all arrays referenced by a nested loop, and then partition the iterationspace into blocks without interblock communication. The arrays can be partitioned underthe communication-free criteria with nonduplicate or duplicate data. Finally, a heuristicmethod for mapping the partitioned array elements and iterations onto the fixed-sizemulticomputers under the consideration of load balancing is proposed. Based on thesemethods, the nested loops can execute without any communication overhead on thedistributed memory multicomputers. Moreover, the performance of the strategies withnonduplicate and duplicate data for matrix multiplication is studied.