A parallel ring ordering algorithm for efficient one-sided Jacobi SVD computations
Journal of Parallel and Distributed Computing
Dynamic ordering for a parallel block-Jacobi SVD algorithm
Parallel Computing - Parallel matrix algorithms and applications
Sourcebook of parallel computing
Sourcebook of parallel computing
A novel scheme for the parallel computation of SVDs
HPCC'06 Proceedings of the Second international conference on High Performance Computing and Communications
Fast dimension reduction for document classification based on imprecise spectrum analysis
Information Sciences: an International Journal
Hi-index | 0.00 |
This paper presents a new algorithm for computing the singular value decomposition (SVD) on multilevel memory hierarchy architectures. This algorithm is based on one-sided JRS iteration, which enables the computation of all Jacobi rotations of a sweep in parallel. One key point of our proposed block JRS algorithm is reusing the loaded data into cache memory by performing computations on matrix blocks (b rows) instead of on strips of vectors as in JRS iteration algorithms. Another key point is that on a reasonably large number of processors the number of sweeps is less than that of one-sided JRS iteration algorithm and closer to the cyclic Jacobi method even though not all rotations in a block are independent. The relaxation technique helps to calculate and apply all independent rotations per block at the same time. On blocks of size b×n, the block JRS performs O(b2n) floating-point operations on O(bn) elements, which reuses the loaded data in cache memory by a factor of b. Besides, on P parallel processors, (2P-1) steps based on block computations are needed per sweep.