The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory
SIAM Journal on Scientific and Statistical Computing
Numerical recipes in C (2nd ed.): the art of scientific computing
Numerical recipes in C (2nd ed.): the art of scientific computing
Matrix computations (3rd ed.)
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Hi-index | 0.00 |
This paper presents a 7-step, semi-systematic approach for designing and implementing parallel algorithms. In this paper, the target implementation uses MPI for message passing. The approach is applied to a family of matrix factorization algorithms- LU, QR, and Cholesky - which share a common structure, namely, that the second factor of each is upper right triangular. The efficacy of the approach is demonstrated by implementing, tuning, and timing execution on two commercially available multiprocessor computers.