A block JRS algorithm for highly parallel computation of SVDs

  • Authors:
  • Mostafa I. Soliman;Sanguthevar Rajasekaran;Reda Ammar

  • Affiliations:
  • Computer & System Section, Electrical Engineering Department, South Valley University, Aswan, Egypt;Department of Computer Science and Engineering, University of Connecticut, Storrs;Department of Computer Science and Engineering, University of Connecticut, Storrs

  • Venue:
  • HPCC'07 Proceedings of the Third international conference on High Performance Computing and Communications
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper presents a new algorithm for computing the singular value decomposition (SVD) on multilevel memory hierarchy architectures. This algorithm is based on one-sided JRS iteration, which enables the computation of all Jacobi rotations of a sweep in parallel. One key point of our proposed block JRS algorithm is reusing the loaded data into cache memory by performing computations on matrix blocks (b rows) instead of on strips of vectors as in JRS iteration algorithms. Another key point is that on a reasonably large number of processors the number of sweeps is less than that of one-sided JRS iteration algorithm and closer to the cyclic Jacobi method even though not all rotations in a block are independent. The relaxation technique helps to calculate and apply all independent rotations per block at the same time. On blocks of size b×n, the block JRS performs O(b2n) floating-point operations on O(bn) elements, which reuses the loaded data in cache memory by a factor of b. Besides, on P parallel processors, (2P-1) steps based on block computations are needed per sweep.