Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Solving problems on concurrent processors. Vol. 1: General techniques and regular problems
Data optimization: allocation of arrays to reduce communication on SIMD machines
Journal of Parallel and Distributed Computing - Massively parallel computation
Uniform techniques for loop optimization
ICS '91 Proceedings of the 5th international conference on Supercomputing
Programming data parallel algorithms on distributed memory using Kali
ICS '91 Proceedings of the 5th international conference on Supercomputing
A unified framework for systematic loop transformations
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Loop distribution with arbitrary control flow
Proceedings of the 1990 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Parallelization of FORTRAN code on distributed-memory parallel processors
ICS '90 Proceedings of the 4th international conference on Supercomputing
Compiler-directed data prefetching in multiprocessors with memory hierarchies
ICS '90 Proceedings of the 4th international conference on Supercomputing
Pandore: a system to manage data distribution
ICS '90 Proceedings of the 4th international conference on Supercomputing
Optimizing Supercompilers for Supercomputers
Optimizing Supercompilers for Supercomputers
An Empirical Study of Fortran Programs for Parallelizing Compilers
IEEE Transactions on Parallel and Distributed Systems
A New Program Transformation to Minimise Communication in Distributed Memory Architecture
PARLE '92 Proceedings of the 4th International PARLE Conference on Parallel Architectures and Languages Europe
Compiling for Locality of Reference
Compiling for Locality of Reference
Automatic generation of systolic programs from nested loops
Automatic generation of systolic programs from nested loops
Hi-index | 0.00 |
This paper is concerned with the efficient execution of array computation on Distributed Memory Architectures by applying compiler-directed program and data transformations. By translating a subset of a single-assignment language, Sisal, into a linear algebraic framework it is possible to transform a program so as to reduce load imbalance and non-local memory access. A new test is presented which allows the construction of transformations to reduce load imbalance. By a new expression of data alignment, transformations to reduce non-local access are derived. Three criteria for partitioning are given as well as a systematic method to map the data and computation to the processors. Finally a new pre-fetching procedure, which prevents redundant non-local accesses, is presented.