Effects of partitioning and scheduling sparse matrix factorization on communication and load balance
Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Introduction to parallel computing: design and analysis of algorithms
Introduction to parallel computing: design and analysis of algorithms
Efficient run-time support for irregular block-structured applications
Journal of Parallel and Distributed Computing - Special issue on irregular problems in supercomputing applications
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
A Concurrent Dynamic Task Graph
ICPP '93 Proceedings of the 1993 International Conference on Parallel Processing - Volume 02
Optimization of a statically partitioned hypermatrix sparse cholesky factorization
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Hi-index | 0.00 |
The simulated models and requirements of engineering programs like computational fluids dynamics and structural mechanics grow more rapidly than single processor performance. Automatic parallelisation seem to be the obvious approach for huge and historic packages like PERMAS. The approach is based on dynamic scheduling, which is more flexible than domain decomposition, is totally transparent to the end-user and shows good speedups because it is able to extract parallelism where others are not. In this paper we show the need of some preparatory steps on the big input matrices for good performance. We present a new approach for blocking that saves storage and decreases the computation critical path. Also a data distribution step is proposed that drives the dynamic scheduler decisions such that an efficient parallelisation can be achieved even on slow multiprocessor networks. A final and important step is the interleaving of the array blocks that are distributed to different processors. This step is essential to expose the parallelism to the scheduler.