Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Supercompilers for parallel and vector computers
Supercompilers for parallel and vector computers
A general algorithm for data dependence analysis
ICS '92 Proceedings of the 6th international conference on Supercomputing
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Loop Transformations for Restructuring Compilers: The Foundations
Loop Transformations for Restructuring Compilers: The Foundations
Scheduling and Automatic Parallelization
Scheduling and Automatic Parallelization
Automatic Partitioning of Parallel Loops with Parallelepiped-Shaped Tiles
IEEE Transactions on Parallel and Distributed Systems
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Iteration Space Tiling for Memory Hierarchies
Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs
The Journal of Supercomputing
Scalarization using loop alignment and loop skewing
The Journal of Supercomputing
A New Genetic Algorithm for Loop Tiling
The Journal of Supercomputing
Loop parallelization in multi-dimensional cartesian space
PSI'06 Proceedings of the 6th international Andrei Ershov memorial conference on Perspectives of systems informatics
The Journal of Supercomputing
Hi-index | 0.00 |
Loop tiling is an efficient loop transformation, mainly applied to detect coarse-grained parallelism in loops. It is a difficult task to apply n-dimensional non-rectangular tiles to generate parallel loops. This paper offers an efficient scheme to apply non-rectangular n-dimensional tiles in non-rectangular iteration spaces, to generate parallel loops. In order to exploit wavefront parallelism efficiently, all the tiles with equal sum of coordinates are assumed to reside on the same wavefront. Also, in order to assign parallelepiped tiles on each wavefront to different processors, an improved block scheduling strategy is offered in this paper.