Parallel sparse LU factorization on different message passing platforms

Authors:
Kai Shen
Affiliations:
Department of Computer Science, University of Rochester, Rochester, NY
Venue:
Journal of Parallel and Distributed Computing
Year:
2006

Citing 25
Cited 0

Direct methods for sparse matrices

Direct methods for sparse matrices
Symbolic factorization for sparse Gaussian elimination with partial pivoting

SIAM Journal on Scientific and Statistical Computing
Algorithm 656: an extended set of basic linear algebra subprograms: model implementation and test programs

ACM Transactions on Mathematical Software (TOMS)
The evolution of the minimum degree ordering algorithm

SIAM Review
Threshold pivoting for dense LU factorization on distributed memory multiprocessors

Proceedings of the 1991 ACM/IEEE conference on Supercomputing
Parallel sparse LU decomposition on a mesh network of transputers

SIAM Journal on Matrix Analysis and Applications
LogP: towards a realistic model of parallel computation

PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
The parallel solution of nonsymmetric sparse linear systems using the H* reordering and an associated factorization

ICS '94 Proceedings of the 8th international conference on Supercomputing
Elimination forest guided 2D sparse LU factorization

Proceedings of the tenth annual ACM symposium on Parallel algorithms and architectures
A combined unifrontal/multifrontal method for unsymmetric sparse matrices

ACM Transactions on Mathematical Software (TOMS)
A Supernodal Approach to Sparse Partial Pivoting

SIAM Journal on Matrix Analysis and Applications
The Design and Use of Algorithms for Permuting Large Entries to the Diagonal of Sparse Matrices

SIAM Journal on Matrix Analysis and Applications
S+: Efficient 2D Sparse LU Factorization on Parallel Machines

SIAM Journal on Matrix Analysis and Applications
On Algorithms For Permuting Large Entries to the Diagonal of a Sparse Matrix

SIAM Journal on Matrix Analysis and Applications
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling

SIAM Journal on Matrix Analysis and Applications
SuperLU_DIST: A scalable distributed-memory sparse direct solver for unsymmetric linear systems

ACM Transactions on Mathematical Software (TOMS)
Using Postordering and Static Symbolic Factorization for Parallel Sparse LU

IPDPS '00 Proceedings of the 14th International Symposium on Parallel and Distributed Processing
Impact of the implementation of MPI point-to-point communications on the performance of two general sparse solvers

Parallel Computing
Sparse gaussian elimination on high-performance computers

Sparse gaussian elimination on high-performance computers
Solving unsymmetric sparse systems of linear equations with PARDISO

Future Generation Computer Systems - Special issue: Selected numerical algorithms
Cross-architecture performance predictions for scientific applications using parameterized models

Proceedings of the joint international conference on Measurement and modeling of computer systems
Multilevel hierarchical matrix multiplication on clusters

Proceedings of the 18th annual international conference on Supercomputing
A column approximate minimum degree ordering algorithm

ACM Transactions on Mathematical Software (TOMS)
Modeling application performance by convolving machine signatures with application profiles

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Hybrid scheduling for the parallel solution of linear systems

Parallel Computing - Parallel matrix algorithms and applications (PMAA'04)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Several message passing-based parallel solvers have been developed for general (non-symmetric) sparse LU factorization with partial pivoting. Existing solvers were mostly deployed and evaluated on parallel computing platforms with high message passing performance (e.g., 1-10 µs in message latency and 100-1000Mbytes/s in message throughput) while little attention has been paid on slower platforms. This paper investigates techniques that are specifically beneficial for LU factorizafion on platforms with slow message passing. In the context of the S+ distributed memory solver, we find that significant reduction in the application message passing overhead can be attained at the cost of extra computation and slightly weakened numerical stability. In particular, we propose batch pivoting to make pivot selections in groups through speculative factorization, and thus substantially decrease the inter-processor synchronization granularity. We experimented on three different message passing platforms with different communication speeds. While the proposed techniques provide no performance benefit and even slightly weaken numerical stability on an IBM Regatta multiprocessor with fast message passing, they improve the performance of our test matrices by 15-460% on an Ethernet-connected 16-node PC cluster. Given the different tradeoffs of communication-reduction techniques on different message passing platforms, we also propose a sampling-based runtime application adaptation approach that automatically determines whether these techniques should be employed for a given platform and input matrix.