A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
The design of a new frontal code for solving sparse, unsymmetric systems
ACM Transactions on Mathematical Software (TOMS)
Domain decomposition: parallel multilevel methods for elliptic partial differential equations
Domain decomposition: parallel multilevel methods for elliptic partial differential equations
The Multifrontal Solution of Indefinite Sparse Symmetric Linear
ACM Transactions on Mathematical Software (TOMS)
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling
SIAM Journal on Matrix Analysis and Applications
Iterative Methods for Sparse Linear Systems
Iterative Methods for Sparse Linear Systems
Algorithm 837: AMD, an approximate minimum degree ordering algorithm
ACM Transactions on Mathematical Software (TOMS)
Task Scheduling in an Asynchronous Distributed Memory Multifrontal Solver
SIAM Journal on Matrix Analysis and Applications
Hi-index | 0.01 |
In this paper we present and evaluate the performance of two different strategies for the deployment of parallel multifrontal and multiple frontal sparse linear solvers in the context of a parallel finite element code. Direct sparse linear solvers are based on sophisticated reorganisation of the standard Gaussian elimination algorithm with the aim of exploring matrix sparsity and reducing the amount of fill-in. Such codes can be successfully applied to very large linear systems, and are especially effective when a sparse linear system needs to be solved for multiple right-hand sides. Unfortunately, many important applications, such as finite element solutions of non-linear, transient problems, require repeated factorisation of the coefficient matrix. In such cases the only way of achieving good performance is parallelisation of both the computation of the finite element matrices and the linear system solution phase. We have developed two different designs for deployment of parallel multifrontal and multiple frontal sparse linear solvers in this context, each deploying three different strategies for the assembly of the global data. These designs are suitable for parallel and heterogeneous architectures. Experiments confirm high efficiency, low communication cost, and reduced initial memory requirements of our deployment designs, compared to a standard deployment strategy.