POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Reducing data communication overhead for DOACROSS loop nests
ICS '94 Proceedings of the 8th international conference on Supercomputing
Integration, the VLSI Journal
Optimal tile size adjustment in compiling general DOACROSS loop nests
ICS '95 Proceedings of the 9th international conference on Supercomputing
Communication-minimal tiling of uniform dependence loops
Journal of Parallel and Distributed Computing
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
A Loop Transformation Theory and an Algorithm to Maximize Parallelism
IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations
IEEE Transactions on Parallel and Distributed Systems
On Supernode Transformation with Minimized Total Running Time
IEEE Transactions on Parallel and Distributed Systems
On Time Optimal Supernode Shape
IEEE Transactions on Parallel and Distributed Systems
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Parallel Scientific Computing in C++ and MPI
Parallel Scientific Computing in C++ and MPI
An efficient code generation technique for tiled iteration spaces
IEEE Transactions on Parallel and Distributed Systems
On-chip cache hierarchy-aware tile scheduling for multicore machines
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
In this paper we revisit the tile-shape selection problem, that has been extensively discussed in bibliography. An efficient approach is proposed for the selection of a suitable tile shape, based on the minimization of the process communication volume. We consider the large family of applications that arise from the discretization of partial differential equations (PDEs). Practical experience has shown that for such applications and distributed memory architectures, minimizing the total communication volume is more important than minimizing the total number of parallel execution steps. We formulate a new method to determine an appropriate communication-aware tile shape, i.e. the one that reduces the communication volume for a fixed number of processes. Our approach is equivalent to defining a proper Cartesian process grid with MPI_Cart_Create, which means that it can be incorporated in applications in a straightforward manner. Our experimental results illustrate that by selecting the tile shape with the proposed method, the total parallel execution time is significantly reduced due to the minimization of the communication volume, despite the fact that a few more parallel execution steps are required.