Weighted Matrix Ordering and Parallel Banded Preconditioners for Iterative Linear System Solvers

Authors:
Murat Manguoglu;Mehmet Koyutürk;Ahmed H. Sameh;Ananth Grama
Affiliations:
mmanguog@cs.purdue.edu and sameh@cs.purdue.edu and ayg@cs.purdue.edu;koyuturk@eecs.case.edu;-;-
Venue:
SIAM Journal on Scientific Computing
Year:
2010

Citing 21
Cited 0

The computation and communication complexity of a parallel banded system solver

ACM Transactions on Mathematical Software (TOMS)
BI-CGSTAB: a fast and smoothly converging variant of BI-CG for the solution of nonsymmetric linear systems

SIAM Journal on Scientific and Statistical Computing
An improved spectral graph partitioning algorithm for mapping parallel computations

SIAM Journal on Scientific Computing
A conjugate gradient method for the spectral partitioning of graphs

Parallel Computing
Experimental study of ILU preconditioners for indefinite matrices

Journal of Computational and Applied Mathematics
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
On Stable Parallel Linear System Solvers

Journal of the ACM (JACM)
Orderings for Incomplete Factorization Preconditioning of Nonsymmetric Problems

SIAM Journal on Scientific Computing
The Design and Use of Algorithms for Permuting Large Entries to the Diagonal of Sparse Matrices

SIAM Journal on Matrix Analysis and Applications
On Optimal Banded Preconditioners for the Five-Point Laplacian

SIAM Journal on Matrix Analysis and Applications
Practical Parallel Band Triangular System Solvers

ACM Transactions on Mathematical Software (TOMS)
On Algorithms for Obtaining a Maximum Transversal

ACM Transactions on Mathematical Software (TOMS)
Preconditioning Highly Indefinite and Nonsymmetric Matrices

SIAM Journal on Scientific Computing
On Algorithms For Permuting Large Entries to the Diagonal of a Sparse Matrix

SIAM Journal on Matrix Analysis and Applications
A Fully Asynchronous Multifrontal Solver Using Distributed Dynamic Scheduling

SIAM Journal on Matrix Analysis and Applications
Implementing Hager's exchange methods for matrix profile reduction

ACM Transactions on Mathematical Software (TOMS)
Reducing the bandwidth of sparse symmetric matrices

ACM '69 Proceedings of the 1969 24th national conference
LAPACK Working Note 20: A Portable Linear Algebra Library For High-Performance Computers

LAPACK Working Note 20: A Portable Linear Algebra Library For High-Performance Computers
Hybrid scheduling for the parallel solution of linear systems

Parallel Computing - Parallel matrix algorithms and applications (PMAA'04)
A parallel hybrid banded system solver: the SPIKE algorithm

Parallel Computing - Parallel matrix algorithms and applications (PMAA'04)
On some parallel banded system solvers

Parallel Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The emergence of multicore architectures and highly scalable platforms motivates the development of novel algorithms and techniques that emphasize concurrency and are tolerant of deep memory hierarchies, as opposed to minimizing raw FLOP counts. While direct solvers are reliable, they are often slow and memory-intensive for large problems. Iterative solvers, on the other hand, are more efficient but, in the absence of robust preconditioners, lack reliability. While preconditioners based on incomplete factorizations (whenever they exist) are effective for many problems, their parallel scalability is generally limited. In this paper, we advocate the use of banded preconditioners instead and introduce a reordering strategy that enables their extraction. In contrast to traditional bandwidth reduction techniques, our reordering strategy takes into account the magnitude of the matrix entries, bringing the heaviest elements closer to the diagonal, thus enabling the use of banded preconditioners. When used with effective banded solvers—in our case, the Spike solver—we show that banded preconditioners (i) are more robust compared to the broad class of incomplete factorization-based preconditioners, (ii) deliver higher processor performance, resulting in faster time to solution, and (iii) scale to larger parallel configurations. We demonstrate these results experimentally on a large class of problems selected from diverse application domains.