POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Tiling multidimensional iteration spaces for nonshared memory machines
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Integration, the VLSI Journal
Optimal tile size adjustment in compiling general DOACROSS loop nests
ICS '95 Proceedings of the 9th international conference on Supercomputing
Communication-minimal tiling of uniform dependence loops
Journal of Parallel and Distributed Computing
Determining the idle time of a tiling
Proceedings of the 24th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Optimal orthogonal tiling of 2-D iterations
Journal of Parallel and Distributed Computing
Automatic data layout for distributed-memory machines
ACM Transactions on Programming Languages and Systems (TOPLAS)
Selecting tile shape for minimal execution time
Proceedings of the eleventh annual ACM symposium on Parallel algorithms and architectures
On tiling space-time mapped loop nests
Proceedings of the thirteenth annual ACM symposium on Parallel algorithms and architectures
On Supernode Transformation with Minimized Total Running Time
IEEE Transactions on Parallel and Distributed Systems
On Time Optimal Supernode Shape
IEEE Transactions on Parallel and Distributed Systems
On the Parallel Execution Time of Tiled Loops
IEEE Transactions on Parallel and Distributed Systems
Determining the Idle Time of a Tiling: New Results
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Effective automatic parallelization of stencil computations
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Hi-index | 0.00 |
Most previous studies on tiling focus on the division of iteration space. However, on distributed memory parallel systems, the decomposition of computation and the distribution of data must be handled at the same time, in order to attain load balancing and to minimize data migration. In this paper, we formulate a 0-1 integer linear programming for the problem of globally optimal tiling to minimize the total execution time. To simplify the selection of tiling parameters, we restrict the tile shape to semi-oblique shape, and present two effective approaches to decide the tile shape for multi-dimensional semi-oblique shaped tiling. Besides, we present a tile-to-processor mapping scheme based on hyperplanes, which can express diverse parallelism and gain better performance than traditional methods. The experimentations with NPB2.3-serial SP and LU on Qsnet connected cluster achieved the average parallel efficiency of 87% and 73% respectively.