Optimal task scheduling at run time to exploit intra-tile parallelism

Authors:
Fabrice Rastello;Amit Rao;Santosh Pande
Affiliations:
LIP, Ecole Normale Supérieure de Lyon 46 allée d'Italic 69364 Lyon Cedex 07, France;College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA;College of Computing, Georgia Institute of Technology, 801 Atlantic Drive, Atlanta, GA
Venue:
Parallel Computing
Year:
2003

Citing 15
Cited 1

Supernode partitioning

POPL '88 Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages
More iteration space tiling

Proceedings of the 1989 ACM/IEEE conference on Supercomputing
A data locality optimizing algorithm

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Global optimizations for parallelism and locality on scalable parallel machines

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Reducing data communication overhead for DOACROSS loop nests

ICS '94 Proceedings of the 8th international conference on Supercomputing
An optimizing Fortran D compiler for MIMD distributed-memory machines

An optimizing Fortran D compiler for MIMD distributed-memory machines
Tile size selection using cache organization and data layout

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Compiler cache optimizations for banded matrix problems

ICS '95 Proceedings of the 9th international conference on Supercomputing
Optimal tile size adjustment in compiling general DOACROSS loop nests

ICS '95 Proceedings of the 9th international conference on Supercomputing
Partitioning and Labeling of Loops by Unimodular Transformations

IEEE Transactions on Parallel and Distributed Systems
Automatic Partitioning of Parallel Loops and Data Arrays for Distributed Shared-Memory Multiprocessors

IEEE Transactions on Parallel and Distributed Systems
Optimal Grain Size Computation for Pipelined Algorithms

Euro-Par '96 Proceedings of the Second International Euro-Par Conference on Parallel Processing - Volume I
Determining the Idle Time of a Tiling: New Results

PACT '97 Proceedings of the 1997 International Conference on Parallel Architectures and Compilation Techniques
Resource-constrained scheduling of partitioned algorithms on processor arrays

PDP '95 Proceedings of the 3rd Euromicro Workshop on Parallel and Distributed Processing
(R) A Compile Time Partitioning Method for DOALL Loops on Distributed Memory Systems

ICPP '96 Proceedings of the Proceedings of the 1996 International Conference on Parallel Processing - Volume 3

Dynamic multi phase scheduling for heterogeneous cluste

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper we address the issue of iteration space tiling to minimize the completion time of loops when executed on multicomputers. The previous work on tiling assumes atomic execution of tiles to minimize synchronization costs. In this work, we remove the restriction of atomicity of tiles so that internal parallelism within tiles is exploited by overlapping computation with communication on multicomputers. The effectiveness of tiling is then critically dependent on the execution order of tasks within a tile. In this paper we present a theoretical framework based on equivalence classes that provides an optimal task ordering under assumptions of fixed and variable orderings of tasks in individual tiles. Our framework is able to handle loop invariant compile-time unknown dependences by efficiently generating optimal task orderings at run-time and results in lower loop completion times. Our solution is an improvement over previous approaches [Proceedings of Euromicro Workshop on Parallel and Distributed Processing, IEEE Computer Society Press, 1995, pp. 571-580; Proceedings of the International Conference on Application Specific Array Processors (ASAP), 1993, pp. 53-64]. Unlike [Proceedings of Euromicro Workshop on Parallel and Distributed Processing, IEEE Computer Society Press, 1995, pp. 571-580; Proceedings of the International Conference on Application Specific Array Processors (ASAP), 1993, pp. 53-64], our approach is optimal for all problem instances with one dependence vector in one-dimension. We show that the performance improvement over previous results is good.