A framework for generating distributed-memory parallel programs for block recursive algorithms
Journal of Parallel and Distributed Computing
Compiling array expressions for efficient execution on distributed-memory machines
Journal of Parallel and Distributed Computing
Contention-free communication scheduling for array redistribution
Parallel Computing
Towards a fast parallel sparse symmetric matrix-vector multiplication
Parallel Computing - Linear systems and associated problems
Improving communication scheduling for array redistribution
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The linear solve problems arising in chemical physics and many other fields involve large sparse matrices with a certain block structure, for which special block Jacobi preconditioners are found to be very efficient. In two previous papers [W. Chen, B. Poirier, Parallel implementation of efficient preconditioned linear solver for grid-based applications in chemical physics. I. Block Jacobi diagonalization, J. Comput. Phys. 219 (1) (2006) 185-197; W. Chen, B. Poirier, Parallel implementation of efficient preconditioned linear solver for grid-based applications in chemical physics. II. QMR linear solver, J. Comput. Phys. 219 (1) (2006) 198-209], a parallel implementation was presented. Excellent parallel scalability was observed for preconditioner construction, but not for the matrix-vector product itself. In this paper, we introduce a new algorithm with (1) greatly improved parallel scalability and (2) generalization for arbitrary number of nodes and data sizes.