Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
Optimal task scheduling at run time to exploit intra-tile parallelism
Parallel Computing
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
In the framework of fully permutable loops, tiling has been studied extensively as a source-to-source program transformation. We build upon recent results by Hogsted, Carter, and Ferrante, who aim at determining the cumulated idle time spent by all processors while executing the partitioned (tiled) computation domain. We propose new, much shorter proofs of all their results and extend these in several important directions. More precisely, we provide an accurate solution for all values of the rise parameter that relates the shape of the iteration space to that of the tiles, and for all possible distributions of the tiles to processors. In contrast, the authors deal only with a limited number of cases and provide upper bounds rather than exact formulas.