Parallel threshold-based ILU factorization

  • Authors:
  • George Karypis;Vipin Kumar

  • Affiliations:
  • University of Minnesota, Minneapolis, MN;University of Minnesota, Minneapolis, MN

  • Venue:
  • SC '97 Proceedings of the 1997 ACM/IEEE conference on Supercomputing
  • Year:
  • 1997

Quantified Score

Hi-index 0.00

Visualization

Abstract

The sparse linear systems arising in finite element applications are commonly solved using iterative methods. In particular, as the size of these problems increases, the increased computational and memory requirements of these problems render in-core direct solution methods unusable, leaving iterative methods as the only viable alternative for solving these problems in core.The major computational kernels of an iterative method are (i) computation of preconditioner, (ii) multiplication of a sparse matrix with a vector, and (iii) application of the preconditioner. Threshold-based incomplete LU factorization have been found to be quite effective in preconditioning iterative system solvers [14]. However, because these factorizations allow the fill elements to be created dynamically, their parallel formulations had not been well understood, and they have been considered to be unsuitable for distributed-memory parallel computers [13]. Furthermore, solution of the resulting sparse triangular system (which is required for the application of the preconditioner) is generally more difficult to parallelize than the multiplication of a sparse matrix with a vector.In this paper we show that highly parallel graph partitioning algorithms in conjunction with parallel algorithms for computing maximal independent sets can be used to develop scalable parallel formulations of incomplete factorization algorithms. We present a highly parallel formulation of the ILUT factorization algorithm [14] for distributed memory parallel computers. This algorithm uses our parallel multilevel k-way graph partitioning algorithm [6,8] in conjunction with a parallel maximal independent subset algorithm to parallelize both the factorization as well as the solution of the resulting triangular factors. We also present a modified ILUT factorization algorithm (ILUT*) that requires less time and is more scalable than ILUT. Our experiments on Cray T3D show that our parallel ILUT* algorithm achieve a high degree of concurrency, and when used as a preconditioner, it is comparable in quality to the unmodified ILUT algorithm. Furthermore, our experiments using the GMRES iterative solver show that the amount of time spent in computing the factorization using the ILUT* algorithm is usually much less than the amount of time required to solve the systems.