A bit-compatible parallelization for ILU(k) preconditioning

Authors:
Xin Dong;Gene Cooperman
Affiliations:
College of Computer Science, Northeastern University, Boston, MA;College of Computer Science, Northeastern University, Boston, MA
Venue:
Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part II
Year:
2011

Citing 7
Cited 0

A parallel preconditioned conjugate gradient package for solving sparse linear systems on a Cray Y-MP

Selected papers from the symposia on CWI-IMACS symposia on parallel scientific computing
Efficient parallel computation of ILU(k) preconditioners

SC '99 Proceedings of the 1999 ACM/IEEE conference on Supercomputing
Iterative solution of linear systems in the 20th century

Journal of Computational and Applied Mathematics - Special issue on numerical analysis 2000 Vol. III: linear algebra
A Scalable Parallel Algorithm for Incomplete Factor Preconditioning

SIAM Journal on Scientific Computing
On the Relations between ILUs and Factored Approximate Inverses

SIAM Journal on Matrix Analysis and Applications
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Multithreaded Geant4: semi-automatic transformation into scalable thread-parallel software

Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II

Quantified Score

Hi-index	0.00

Visualization

Abstract

ILU(k) is a commonly used preconditioner for iterative linear solvers for sparse, non-symmetric systems. It is often preferred for the sake of its stability. We present TPILU(k), the first efficiently parallelized ILU(k) preconditioner that maintains this important stability property. Even better, TPILU(k) preconditioning produces an answer that is bit-compatible with the sequential ILU(k) preconditioning. In terms of performance, the TPILU(k) preconditioning is shown to run faster whenever more cores are made available to it -- while continuing to be as stable as sequential ILU(k). This is in contrast to some competing methods that may become unstable if the degree of thread parallelism is raised too far. Where Block Jacobi ILU(k) fails in an application, it can be replaced by TPILU(k) in order to maintain good performance, while also achieving full stability. As a further optimization, TPILU(k) offers an optional level-based incomplete inverse method as a fast approximation for the original ILU(k) preconditioned matrix. Although this enhancement is not bit-compatible with classical ILU(k), it is bit-compatible with the output from the single-threaded version of the same algorithm. In experiments on a 16-core computer, the enhanced TPILU(k)-based iterative linear solver performed up to 9 times faster. As we approach an era of many-core computing, the ability to efficiently take advantage of many cores will become ever more important.