Sparse Matrix Structure for Dynamic Parallelisation Efficiency

Authors:
Markus Ast;Cristina Barrado;José M. Cela;Rolf Fischer;Jesús Labarta;Óscar Laborda;Hartmut Manz;Uwe Schulz
Affiliations:
-;-;-;-;-;-;-;-
Venue:
Euro-Par '00 Proceedings from the 6th International Euro-Par Conference on Parallel Processing
Year:
2000

Citing 6
Cited 1

Computational models and task scheduling for parallel sparse Cholesky factorization

Parallel Computing
Effects of partitioning and scheduling sparse matrix factorization on communication and load balance

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Introduction to parallel computing: design and analysis of algorithms

Introduction to parallel computing: design and analysis of algorithms
Efficient run-time support for irregular block-structured applications

Journal of Parallel and Distributed Computing - Special issue on irregular problems in supercomputing applications
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
A Concurrent Dynamic Task Graph

ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 02

Optimization of a statically partitioned hypermatrix sparse cholesky factorization

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The simulated models and requirements of engineering programs like computational fluids dynamics and structural mechanics grow more rapidly than single processor performance. Automatic parallelisation seem to be the obvious approach for huge and historic packages like PERMAS. The approach is based on dynamic scheduling, which is more flexible than domain decomposition, is totally transparent to the end-user and shows good speedups because it is able to extract parallelism where others are not. In this paper we show the need of some preparatory steps on the big input matrices for good performance. We present a new approach for blocking that saves storage and decreases the computation critical path. Also a data distribution step is proposed that drives the dynamic scheduler decisions such that an efficient parallelisation can be achieved even on slow multiprocessor networks. A final and important step is the interleaving of the array blocks that are distributed to different processors. This step is essential to expose the parallelism to the scheduler.