On iterative QR pre-processing in the parallel block-Jacobi SVD algorithm

Authors:
Martin Bečka;Gabriel Okša;Marián Vajteršic;Laura Grigori
Affiliations:
Institute of Mathematics, Dept. of Informatics, Slovak Academy of Sciences, Bratislava, Slovak Republic;Institute of Mathematics, Dept. of Informatics, Slovak Academy of Sciences, Bratislava, Slovak Republic;Dept. of Computer Sciences, University of Salzburg, Salzburg, Austria;INRIA, University Paris Sud-11, Orsay, France
Venue:
Parallel Computing
Year:
2010

Citing 6
Cited 0

Analysis of a QR Algorithm for Computing Singular Values

SIAM Journal on Matrix Analysis and Applications
The QLP Approximation to the Singular Value Decomposition

SIAM Journal on Scientific Computing
Dynamic ordering for a parallel block-Jacobi SVD algorithm

Parallel Computing - Parallel matrix algorithms and applications
Efficient pre-processing in the parallel block-Jacobi SVD algorithm

Parallel Computing - Parallel matrix algorithms and applications (PMAA'04)
The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods

The Matrix Eigenvalue Problem: GR and Krylov Subspace Methods
Design and Implementation of the ScaLAPACK LU, QR, and Cholesky Factorization Routines

Scientific Programming

Quantified Score

Hi-index	0.00

Visualization

Abstract

An efficient version of the parallel two-sided block-Jacobi algorithm for the singular value decomposition of an mxn matrix A includes the pre-processing step, which consists of the QR factorization of A with column pivoting followed by the optional LQ factorization of the R-factor. Then the iterative two-sided block-Jacobi algorithm is applied in parallel to the R-factor (or L-factor). For the efficient computation of the parallel QR (or LQ) factorization with (or without) column pivoting implemented in the ScaLAPACK, some matrix block cyclic distribution on a process grid rxc with p=rxc,r,c=1, and block size n"bxn"b is required so that all processors remain busy during the whole parallel QR (or LQ) factorization. Optimal values for parameters r, c and n"b are estimated experimentally using matrices of order n=4000 and 8000, and the number of processors p=8 and 16, respectively. It turns out that the optimal values are about n"b=100 and r=