Macrotasking the singluar value decomposition of block circulant matrices on the Cray-2

  • Authors:
  • J. R. Baker

  • Affiliations:
  • Department of Electrical Engineering and Computer Sciences, University of California, Berkeley and Research Medicine and Radiation Biophysics Division, Lawrence Berkeley Laboratory

  • Venue:
  • Proceedings of the 1989 ACM/IEEE conference on Supercomputing
  • Year:
  • 1989

Quantified Score

Hi-index 0.00

Visualization

Abstract

A parallel algorithm to compute the singular value decomposition (SVD) of block circulant matrices on the Cray-2 is described. For a block circulant form described by M blocks with m x n elements in each block, the computation time using an SVD algorithm for general matrices has a lower bound &OHgr;(M3min(m, n)mn). Using a combination of fast Fourier transform (FFT) and SVD steps, the computation time for block circulant singular value decomposition (BCSVD) has a lower bound &OHgr;(Mmin(m, n)mn); a relative savings of ~ M2. Memory usage bounds are reduced from &THgr;(M2mn) to &THgr;(Mmn); a relative savings of ~ M. For M = m = n = 64, this decreases the computation time from approximately 12 hours to 30 seconds and memory usage is reduced from 768 megabytes to 12 megabytes. The BCSVD algorithm partitions well into n macrotasks with a granularity of &THgr;(mM log M) for the FFT portion of the algorithm. The SVD portion of the algorithm partitions into M macrotasks with a granularity of &OHgr;(min(m, n)mn). Again, for the case where M = m = n = 64, the FFT granularity is 29ms and the SVD granularity is 428ms. A speedup of 3.06 was achieved by using a prescheduled partitioning of tasks. The process creation overhead was 2.63ms. Using a more elaborate self-scheduling method with four synchronizing server processes, a speedup of 3.25 was observed with four processors available. The server synchronization overhead was 0.32ms. Relative memory overhead in both cases was about 4% for data space and 40% for code space.