Efficient Run-Time Support for Irregular Task Computations with Mixed Granularities

Authors:
Cong Fu;Tao Yang
Affiliations:
-;-
Venue:
IPPS '96 Proceedings of the 10th International Parallel Processing Symposium
Year:
1996

Citing 16
Cited 4

Parallel algorithms for sparse linear systems

SIAM Review
A new parallel architecture for sparse matrix computation based on finite projective geometries

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
PYRROS: static task scheduling and code generation for message passing multiprocessors

ICS '92 Proceedings of the 6th international conference on Supercomputing
Performance of distributed sparse Cholesky factorization with pre-scheduling

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Applications and performance analysis of a compile-time optimization approach for list scheduling algorithms on distributed memory multiprocessors

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization

Exploiting the memory hierarchy in sequential and parallel sparse Cholesky factorization
Techniques to overlap computation and communication in irregular iterative applications

ICS '94 Proceedings of the 8th international conference on Supercomputing
Communication optimizations for irregular scientific computations on distributed memory architectures

Journal of Parallel and Distributed Computing - Special issue on scalability of parallel algorithms and architectures
Remote queues: exposing message queues for optimization and atomicity

Proceedings of the seventh annual ACM symposium on Parallel algorithms and architectures
Partitioning and Scheduling Parallel Programs for Multiprocessors

Partitioning and Scheduling Parallel Programs for Multiprocessors
Parallel Programming and Compilers

Parallel Programming and Compilers
Improved load distribution in parallel sparse cholesky factorization

Proceedings of the 1994 ACM/IEEE conference on Supercomputing
On the Granularity and Clustering of Directed Acyclic Task Graphs

IEEE Transactions on Parallel and Distributed Systems
Experience with active messages on the Meiko CS-2

IPPS '95 Proceedings of the 9th International Symposium on Parallel Processing
Run-time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures

Run-time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures

Run-time compilation for parallel sparse matrix computations

ICS '96 Proceedings of the 10th international conference on Supercomputing
Efficient Sparse LU Factorization with Partial Pivoting on Distributed Memory Architectures

IEEE Transactions on Parallel and Distributed Systems
Sparse LU factorization with partial pivoting on distributed memory machines

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Hardware---software optimizations of reconfigurable multi-core processors for floating-point computations of large sparse matrices

Journal of Real-Time Image Processing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many irregular scientific computing problems can be modeled by directed acyclic task graphs (DAGs). In this paper, we present an efficient run-time system for executing general asynchronous DAG schedules on distributed memory machines. Our solution tightly integrates the run-time scheme with a fast communication mechanism to eliminate unnecessary overhead in message buffering and copying, and takes advantage of task dependence properties to ensure the correctness of execution. We demonstrate the applications of this scheme in sparse LU and Cholesky factorizations for which actual speedups have been hard to obtain in the literature because parallelism in these problems is irregular and limited. Our experiments on Meiko CS-2 show the promising results of our approach in exploiting irregular task parallelism with mixed granularities.