Global Tiling for Communication Minimal Parallelization on Distributed Memory Systems

Authors:
Lei Liu;Li Chen;Chengyong Wu;Xiao-Bing Feng
Affiliations:
Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080 and Graduate School of the Chinese Academy of Sciences, Beijing, China ...;Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080;Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080;Key Laboratory of Computer Architecture Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 100080
Venue:
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Year:
2008

Citing 15
Cited 0

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Tiling multidimensional iteration spaces for nonshared memory machines

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
(Pen)-ultimate tiling?

Integration, the VLSI Journal
Optimal tile size adjustment in compiling general DOACROSS loop nests

ICS '95 Proceedings of the 9th international conference on Supercomputing
Communication-minimal tiling of uniform dependence loops

Journal of Parallel and Distributed Computing
Determining the idle time of a tiling

Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimal orthogonal tiling of 2-D iterations

Journal of Parallel and Distributed Computing
Automatic data layout for distributed-memory machines

ACM Transactions on Programming Languages and Systems (TOPLAS)
Selecting tile shape for minimal execution time

Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
On tiling space-time mapped loop nests

Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
On Supernode Transformation with Minimized Total Running Time

IEEE Transactions on Parallel and Distributed Systems
On Time Optimal Supernode Shape

IEEE Transactions on Parallel and Distributed Systems
On the Parallel Execution Time of Tiled Loops

IEEE Transactions on Parallel and Distributed Systems
Determining the Idle Time of a Tiling: New Results

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Effective automatic parallelization of stencil computations

Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Most previous studies on tiling focus on the division of iteration space. However, on distributed memory parallel systems, the decomposition of computation and the distribution of data must be handled at the same time, in order to attain load balancing and to minimize data migration. In this paper, we formulate a 0-1 integer linear programming for the problem of globally optimal tiling to minimize the total execution time. To simplify the selection of tiling parameters, we restrict the tile shape to semi-oblique shape, and present two effective approaches to decide the tile shape for multi-dimensional semi-oblique shaped tiling. Besides, we present a tile-to-processor mapping scheme based on hyperplanes, which can express diverse parallelism and gain better performance than traditional methods. The experimentations with NPB2.3-serial SP and LU on Qsnet connected cluster achieved the average parallel efficiency of 87% and 73% respectively.