Run-time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures

  • Authors:
  • Cong Fu;Tao Yang

  • Affiliations:
  • -;-

  • Venue:
  • Run-time Techniques for Exploiting Irregular Task Parallelism on Distributed Memory Architectures
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic scheduling for directed acyclic graphs (DAG) and its applications for coarse-grained irregular problems such as large n-body simulation have been studied in the literature. However solving irregular problems with mixed granularities such as sparse matrix factorization is challenging since it requires efficient run-time support for executing the DAG schedule. In this paper, we investigate run-time compilation and supporting techniques for executing general asynchronous DAG schedules on distributed memory machines. Our solution tightly integrates the run-time scheme with a fast communication mechanism to eliminate unnecessary overhead in message buffering and copying, and takes advantage of task dependence properties to ensure the correctness of execution. We demonstrate the applications of this scheme in sparse Cholesky and LU factorizations for which actual speedups have been hard to obtain in the literature. Our experiments on Meiko CS-2 show that the automatically scheduled code achieves scalable performance for these problems and the run-time overhead is small compared to the total execution time.