Minimizing Completion Time for Loop Tiling with Computation and Communication Overlapping

Authors:
Georgia Goumas
Affiliations:
-
Venue:
IPDPS '01 Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS'01) - Volume 1
Year:
2001

Citing 0
Cited 2

Enhancing the Performance of Tiled Loop Execution onto Clusters Using Memory Mapped Network Interfaces and Pipelined Schedules

IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
Hyperplane Grouping and Pipelined Schedules: How to Execute Tiled Loops Fast on Clusters of SMPs

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a new method for the problem of minimizing the execution time of nested for-loops using a tiling transformation. In our approach, we are interested not only in tile size and shape according to the required communication to computation ratio, but also in overallcompletion time. We select a time hyperplane to execute different tiles much more efficiently by exploiting the inherent overlapping between communication and computation phases among successive, atomic tile executions. We assign tiles to processors according to the tile space boundaries, thus considering the iteration space bounds. Our schedule considerably reduces overall completion time under the assumption that some part from every communication phase can be efficiently overlapped with atomic, pure tile computations.The overall schedule resembles a pipelined datapath where computations are not anymore interleaved with sends and receives to non-local processors. Experimental results in a cluster of Pentiums by using various MPI send primitives show that the total completion time is significantly reduced.