POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Reducing data communication overhead for DOACROSS loop nests
ICS '94 Proceedings of the 8th international conference on Supercomputing
An optimizing Fortran D compiler for MIMD distributed-memory machines
An optimizing Fortran D compiler for MIMD distributed-memory machines
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Compiler cache optimizations for banded matrix problems
ICS '95 Proceedings of the 9th international conference on Supercomputing
Optimal tile size adjustment in compiling general DOACROSS loop nests
ICS '95 Proceedings of the 9th international conference on Supercomputing
Partitioning and Labeling of Loops by Unimodular Transformations
IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Optimal Grain Size Computation for Pipelined Algorithms
Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Determining the Idle Time of a Tiling: New Results
PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Resource-constrained scheduling of partitioned algorithms on processor arrays
PDP '95 Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing
(R) A Compile Time Partitioning Method for DOALL Loops on Distributed Memory Systems
ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3
Dynamic multi phase scheduling for heterogeneous cluste
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hi-index | 0.00 |
In this paper we address the issue of iteration space tiling to minimize the completion time of loops when executed on multicomputers. The previous work on tiling assumes atomic execution of tiles to minimize synchronization costs. In this work, we remove the restriction of atomicity of tiles so that internal parallelism within tiles is exploited by overlapping computation with communication on multicomputers. The effectiveness of tiling is then critically dependent on the execution order of tasks within a tile. In this paper we present a theoretical framework based on equivalence classes that provides an optimal task ordering under assumptions of fixed and variable orderings of tasks in individual tiles. Our framework is able to handle loop invariant compile-time unknown dependences by efficiently generating optimal task orderings at run-time and results in lower loop completion times. Our solution is an improvement over previous approaches [Proceedings of Euromicro Workshop on Parallel and Distributed Processing, IEEE Computer Society Press, 1995, pp. 571-580; Proceedings of the International Conference on Application Specific Array Processors (ASAP), 1993, pp. 53-64]. Unlike [Proceedings of Euromicro Workshop on Parallel and Distributed Processing, IEEE Computer Society Press, 1995, pp. 571-580; Proceedings of the International Conference on Application Specific Array Processors (ASAP), 1993, pp. 53-64], our approach is optimal for all problem instances with one dependence vector in one-dimension. We show that the performance improvement over previous results is good.