Scalable hybrid sparse linear solvers

  • Authors:
  • Keita Teranishi

  • Affiliations:
  • The Pennsylvania State University

  • Venue:
  • Scalable hybrid sparse linear solvers
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

In many large-scale simulations that depend on parallel processing to solve problems of scientific interest, the application time can be dominated by the time for the underlying sparse linear system solution. This thesis concerns the development of effective sparse solvers for distributed memory multiprocessors using a hybrid of direct and iterative sparse solution methods. More specifically, we accelerate the convergence of an iterative solution method, namely the method of Conjugate Gradients (CG) using an incomplete Cholesky preconditioner. The latter is an approximation to the sparse matrix factor used in a direct method. Our parallel incomplete factorization scheme can support a range of fill-in to provide flexible preconditioning that can meet the requirements of a variety of applications. We have also developed special techniques that allow the effective application of such preconditioners on distributed memory multiprocessors; the relatively large latencies of interprocessor communication on such parallel computers make conventional schemes using parallel substitution extremely inefficient. The first part of the dissertation focuses on the design of a parallel tree-based left-looking drop-threshold incomplete Cholesky factorization scheme using extensions of techniques from direct methods. The second part concerns modifications to the incomplete Cholesky factor to enable its efficient application as a preconditioner; these modifications concern selectively replacing certain triangular submatrices in the factor by their approximate inverses. We develop a ‘Selective Inversion’ (SI) scheme based on explicit inversion of selected submatrices and another variant using Selective Sparse Approximate Inversion (SSAI). The final part of the dissertation concerns latency-tolerant application of our ICT-SI and ICT-SSAI preconditioners by selectively using parallel matrix-vector multiplication instead of parallel substitution. We analyze the computation and communication costs of all our schemes for model sparse matrices arising from finite difference methods on regular domains in two and three dimensions. We also provide extensive empirical results on the performance of our methods on such model matrices and others from practical applications. Our results demonstrate that both our ICT-SI and ICT-SSAI hybrid solvers are significantly more reliable than other preconditioned CG solvers. Furthermore, although their scalability lags that of some simpler schemes, they can still be the method of choice for matrices that require relatively strong preconditioning for CG to converge. Our analysis and experiments indicate that ICT-SSAI is more scalable that ICT-SI; however, our experiments indicate that this scalability is achieved at the expense of a slight decrease in preconditioning quality. We have thus developed scalable and reliable hybrid solvers that can potentially provide significant improvements in the performance of modeling and simulation applications.