Selecting the tile shape to reduce the total communication volume

Authors:
Nikolaos Drosinos;Georgios Goumas;Nectarios Koziris
Affiliations:
National Technical University of Athens, School of Electrical and Computer Engineering, Zografou, Greece;National Technical University of Athens, School of Electrical and Computer Engineering, Zografou, Greece;National Technical University of Athens, School of Electrical and Computer Engineering, Zografou, Greece
Venue:
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Year:
2006

Citing 15
Cited 1

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Reducing data communication overhead for DOACROSS loop nests

ICS '94 Proceedings of the 8th international conference on Supercomputing
(Pen)-ultimate tiling?

Integration, the VLSI Journal
Optimal tile size adjustment in compiling general DOACROSS loop nests

ICS '95 Proceedings of the 9th international conference on Supercomputing
Communication-minimal tiling of uniform dependence loops

Journal of Parallel and Distributed Computing
Selecting tile shape for minimal execution time

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
A Loop Transformation Theory and an Algorithm to Maximize Parallelism

IEEE Transactions on Parallel and Distributed Systems
Partitioning and Labeling of Loops by Unimodular Transformations

IEEE Transactions on Parallel and Distributed Systems
On Supernode Transformation with Minimized Total Running Time

IEEE Transactions on Parallel and Distributed Systems
On Time Optimal Supernode Shape

IEEE Transactions on Parallel and Distributed Systems
Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
Tiling with limited resources

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
Parallel Scientific Computing in C++ and MPI

Parallel Scientific Computing in C++ and MPI
An efficient code generation technique for tiled iteration spaces

IEEE Transactions on Parallel and Distributed Systems

On-chip cache hierarchy-aware tile scheduling for multicore machines

CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we revisit the tile-shape selection problem, that has been extensively discussed in bibliography. An efficient approach is proposed for the selection of a suitable tile shape, based on the minimization of the process communication volume. We consider the large family of applications that arise from the discretization of partial differential equations (PDEs). Practical experience has shown that for such applications and distributed memory architectures, minimizing the total communication volume is more important than minimizing the total number of parallel execution steps. We formulate a new method to determine an appropriate communication-aware tile shape, i.e. the one that reduces the communication volume for a fixed number of processes. Our approach is equivalent to defining a proper Cartesian process grid with MPI_Cart_Create, which means that it can be incorporated in applications in a straightforward manner. Our experimental results illustrate that by selecting the tile shape with the proposed method, the total parallel execution time is significantly reduced due to the minimization of the communication volume, despite the fact that a few more parallel execution steps are required.