Adapting a parallel sparse direct solver to architectures with clusters of SMPs

Authors:
Patrick R. Amestoy;Iain S. Duff;Stéphane Pralet;Christof Vömel
Affiliations:
ENSEEIHT, 2 rue Camichel, BP 7122--F 31071 Toulouse Cedex 7, France;CERFACS, Toulouse, and Atlas Centre, RAL, Oxon OX11 0QX, UK;CERFACS, 42, av. G. Coriolis, 31057 Toulouse Cedex 01, France;CERFACS, 42, av. G. Coriolis, 31057 Toulouse Cedex 01, France
Venue:
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
Year:
2003

Citing 19
Cited 5

Direct methods for sparse matrices

Direct methods for sparse matrices
Algorithm 679: A set of level 3 basic linear algebra subprograms: model implementation and test programs

ACM Transactions on Mathematical Software (TOMS)
The role of elimination trees in sparse factorization

SIAM Journal on Matrix Analysis and Applications
The multifrontal method for sparse matrix solution: theory and practice

SIAM Review
On finding supernodes for sparse matrix computations

SIAM Journal on Matrix Analysis and Applications
A mapping algorithm for parallel sparse Cholesky factorization

SIAM Journal on Scientific Computing
An Approximate Minimum Degree Ordering Algorithm

SIAM Journal on Matrix Analysis and Applications
Node Selection Strategies for Bottom-Up Sparse Matrix Ordering

SIAM Journal on Matrix Analysis and Applications
A Supernodal Approach to Sparse Partial Pivoting

SIAM Journal on Matrix Analysis and Applications
Performance of Greedy Ordering Heuristics for Sparse Cholesky Factorization

SIAM Journal on Matrix Analysis and Applications
The Multifrontal Solution of Indefinite Sparse Symmetric Linear

ACM Transactions on Mathematical Software (TOMS)
Analysis and comparison of two general sparse solvers for distributed memory computers

ACM Transactions on Mathematical Software (TOMS)
Numerical Linear Algebra for High Performance Computers

Numerical Linear Algebra for High Performance Computers
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling

SIAM Journal on Matrix Analysis and Applications
PASTIX: a high-performance parallel direct solver for sparse symmetric positive definite systems

Parallel Computing - Parallel matrix algorithms and applications
Sparse Matrix Ordering with SCOTCH

HPCN Europe '97 Proceedings of the International Conference and Exhibition on High-Performance Computing and Networking
LAPACK Working Note 95: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance

LAPACK Working Note 95: ScaLAPACK: A Portable Linear Algebra Library for Distributed Memory Computers -- Design Issues and Performance
Task Scheduling in an Asynchronous Distributed Memory Multifrontal Solver

SIAM Journal on Matrix Analysis and Applications
The university of Florida sparse matrix collection

ACM Transactions on Mathematical Software (TOMS)

Hybrid scheduling for the parallel solution of linear systems

Parallel Computing - Parallel matrix algorithms and applications (PMAA'04)
On finding approximate supernodes for an efficient block-ILU(k) factorization

Parallel Computing
Partitioning and blocking issues for a parallel incomplete factorization

PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Applying parallel direct solver techniques to build robust high performance preconditioners

PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
A parallel solution of large-scale heat equation based on distributed memory hierarchy system

ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the direct solution of general sparse linear systems baseds on a multifrontal method. The approach combines partial static scheduling of the task dependency graph during the symbolic factorization and distributed dynamic scheduling during the numerical factorization to balance the work among the processes of a distributed memory computer. We show that to address clusters of Symmetric Multi-Processor (SMP) architectures, and more generally non-uniform memory access multiprocessors, our algorithms for both the static and the dynamic scheduling need to be revisited to take account of the non-uniform cost of communication. The performance analysis on an IBM SP3 with 16 processors per SMP node and up to 128 processors shows that we can significantly reduce both the amount of inter-node communication and the solution time.