Process decomposition through locality of reference
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
A report on the Sisal language project
Journal of Parallel and Distributed Computing - Special issue: data-flow processing
Optimization of array accesses by collective loop transformations
ICS '91 Proceedings of the 5th international conference on Supercomputing
Compiling Fortran D for MIMD distributed-memory machines
Communications of the ACM
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Automatic array alignment in data-parallel programs
POPL '93 Proceedings of the 20th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Program partitioning for NUMA multiprocessor computer systems
Journal of Parallel and Distributed Computing - Special issue on performance of supercomputers
Communication-free hyperplane partitioning of nested loops
Journal of Parallel and Distributed Computing
Compiling Fortran 90D/HPF for distributed memory MIMD computers
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Data and task alignment in distributed memory architectures
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
An approach to communication-efficient data redistribution
ICS '94 Proceedings of the 8th international conference on Supercomputing
Compiling Functional Parallelism on Distributed-Memory Systems
IEEE Parallel & Distributed Technology: Systems & Technology
A Scalable Scheduling Scheme for Functional Parallelism on Distributed Memory Multiprocessor Systems
IEEE Transactions on Parallel and Distributed Systems
Data and workload distribution in a multithreaded architecture
Journal of Parallel and Distributed Computing
Compiling Communication-Efficient Programs for Massively Parallel Machines
IEEE Transactions on Parallel and Distributed Systems
Compiling Global Name-Space Parallel Loops for Distributed Execution
IEEE Transactions on Parallel and Distributed Systems
Compile-Time Techniques for Data Distribution in Distributed Memory Machines
IEEE Transactions on Parallel and Distributed Systems
Automatic Extraction of Functional Parallelism from Ordinary Programs
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
On the Granularity and Clustering of Directed Acyclic Task Graphs
IEEE Transactions on Parallel and Distributed Systems
Communication-Free Data Allocation Techniques for Parallelizing Compilers on Multicomputers
IEEE Transactions on Parallel and Distributed Systems
Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing
Communication-Free Parallelization via Affine Transformations
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
Solving Alignment Using Elementary Linear Algebra
LCPC '94 Proceedings of the 7th International Workshop on Languages and Compilers for Parallel Computing
A New Approach to Array Redistribution: Strip Mining Redistribution
PARLE '94 Proceedings of the 6th International PARLE Conference on Parallel Architectures and Languages Europe
Automatic Data Layout Using 0-1 Integer Programming
PACT '94 Proceedings of the IFIP WG10.3 Working Conference on Parallel Architectures and Compilation Techniques
SPDP '96 Proceedings of the 8th IEEE Symposium on Parallel and Distributed Processing (SPDP '96)
Compilation Techniques for Optimizing Communication on Distributed-Memory Systems
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 02
Optimizing Data Decomposition for Data Parallel Programs
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
Hi-index | 0.00 |
Due to a significant communication overhead of sending and receiving data, the loop partitioning approaches on distributed memory systems must guarantee not just the computation loadba lance but computation+communication load balance. The previous approaches in loop partitioning have achieved a communication-free, computation load balanced iteration space partitioning solution for a limited subset of DOALL loops [6]. But a large category of DOALL loops inevitably result in communication and the tradeoffs between computation and communication must be carefully analyzed for those loops in order to balance out the combined computation time and communication overheads. In this work, we describe a partitioning approach based on the above motivation for the general cases of DOALL loops. Our goal is to achieve a computation+ communication load balanced partitioning through static data and iteration space distribution. First, code partitioning phase analyzes the references in the body of the DOALL loop nest and determines a set of directions for reducing a larger degree of communication by trading a lesser degree of parallelism. The partitioning is carried out in the iteration space of the loop by cyclically following a set of direction vectors such that the data references are maximally localized and re-used eliminating a large communication volume. A new larger partition owns rule is formulated to minimize the communication overhead for a compute intensive partition by localizing its references relatively more than a smaller non-compute intensive partition. A Partition Interaction Graph is then constructedt hat is used to merge the partitions to achieve granularity adjustment, computation+communication load balance and mapping on the actual number of available processors. Relevant theory anda lgorithms are developed along with a performance evaluation on Cray T3D.