Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays
IEEE Transactions on Computers
POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Scanning polyhedra with DO loops
PPOPP '91 Proceedings of the third ACM SIGPLAN symposium on Principles and practice of parallel programming
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Evaluating compiler optimizations for Fortran D
Journal of Parallel and Distributed Computing - Special issue on data parallel algorithms and programming
Integration, the VLSI Journal
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Automatic Blocking by a Compiler
Proceedings of the Fifth SIAM Conference on Parallel Processing for Scientific Computing
Precise Tiling for Uniform Loop Nests
ASAP '95 Proceedings of the IEEE International Conference on Application Specific Array Processors
Two-dimensional orthogonal tiling: from theory to practice
HIPC '96 Proceedings of the Third International Conference on High-Performance Computing (HiPC '96)
Automatic Blocking of Nested Loops
Automatic Blocking of Nested Loops
Communication Optimizations Used in the Paradigm Compiler for Distributed-Memory Multicomputers
ICPP '94 Proceedings of the 1994 International Conference on Parallel Processing - Volume 02
Hi-index | 0.00 |
We discuss in this paper the problem of finding the optimal tiling transformation of three-dimensional uniform recurrences on a two-dimensional torus/grid of distributed-memory general-purpose machines. We show that even for the simplest case of recurrences which allows for such transformation, the corresponding problem of minimizing the total running time is a non-trivial non-linear integer programming problem. For the later we derive an O(1) algorithm for finding a good approximation solution. The theoretical evaluations and the experimental results show that the obtained solution approximates the original minimum sufficiently well in the context of the considered problem. Such analytical results are of obvious interest and can be successfully used in parallelizing compilers as well as in performance tuning of parallel codes.