Analysis of a QR Algorithm for Computing Singular Values
SIAM Journal on Matrix Analysis and Applications
The QLP Approximation to the Singular Value Decomposition
SIAM Journal on Scientific Computing
Dynamic ordering for a parallel block-Jacobi SVD algorithm
Parallel Computing - Parallel matrix algorithms and applications
Efficient pre-processing in the parallel block-Jacobi SVD algorithm
Parallel Computing - Parallel matrix algorithms and applications (PMAA'04)
The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods
The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines
Scientific Programming
Hi-index | 0.00 |
An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an mxn matrix A includes the pre-processing step, which consists of the QR factorization of A with column pivoting followed by the optional LQ factorization of the R-factor. Then the iterative two-sided block-Jacobi algorithm is applied in parallel to the R-factor (or L-factor). For the efficient computation of the parallel QR (or LQ) factorization with (or without) column pivoting implemented in the ScaLAPACK, some matrix block cyclic distribution on a process grid rxc with p=rxc,r,c=1, and block size n"bxn"b is required so that all processors remain busy during the whole parallel QR (or LQ) factorization. Optimal values for parameters r, c and n"b are estimated experimentally using matrices of order n=4000 and 8000, and the number of processors p=8 and 16, respectively. It turns out that the optimal values are about n"b=100 and r=