Research note: Parallel implementation of an efficient preconditioned linear solver for grid-based applications in chemical physics. III: Improved parallel scalability for sparse matrix-vector products

Authors:
Wenwu Chen;Bill Poirier
Affiliations:
Department of Chemistry and Biochemistry, Texas Tech University, Box 41061, Lubbock, TX 79409-1061, USA and Department of Physics, Texas Tech University, Box 41061, Lubbock, TX 79409-1061, USA;Department of Chemistry and Biochemistry, Texas Tech University, Box 41061, Lubbock, TX 79409-1061, USA and Department of Physics, Texas Tech University, Box 41061, Lubbock, TX 79409-1061, USA
Venue:
Journal of Parallel and Distributed Computing
Year:
2010

Citing 5
Cited 0

A framework for generating distributed-memory parallel programs for block recursive algorithms

Journal of Parallel and Distributed Computing
Compiling array expressions for efficient execution on distributed-memory machines

Journal of Parallel and Distributed Computing
Contention-free communication scheduling for array redistribution

Parallel Computing
Towards a fast parallel sparse symmetric matrix-vector multiplication

Parallel Computing - Linear systems and associated problems
Improving communication scheduling for array redistribution

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The linear solve problems arising in chemical physics and many other fields involve large sparse matrices with a certain block structure, for which special block Jacobi preconditioners are found to be very efficient. In two previous papers [W. Chen, B. Poirier, Parallel implementation of efficient preconditioned linear solver for grid-based applications in chemical physics. I. Block Jacobi diagonalization, J. Comput. Phys. 219 (1) (2006) 185-197; W. Chen, B. Poirier, Parallel implementation of efficient preconditioned linear solver for grid-based applications in chemical physics. II. QMR linear solver, J. Comput. Phys. 219 (1) (2006) 198-209], a parallel implementation was presented. Excellent parallel scalability was observed for preconditioner construction, but not for the matrix-vector product itself. In this paper, we introduce a new algorithm with (1) greatly improved parallel scalability and (2) generalization for arbitrary number of nodes and data sizes.