Computer
Designing efficient algorithms for parallel computers
Designing efficient algorithms for parallel computers
The fast Fourier transform and its applications
The fast Fourier transform and its applications
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Numerical Methods, Software and Analysis
Numerical Methods, Software and Analysis
Digital Image Restoration
Hi-index | 0.00 |
A parallel algorithm to compute the singular value decomposition (SVD) of block circulant matrices on the Cray-2 is described. For a block circulant form described by M blocks with m x n elements in each block, the computation time using an SVD algorithm for general matrices has a lower bound &OHgr;(M3min(m, n)mn). Using a combination of fast Fourier transform (FFT) and SVD steps, the computation time for block circulant singular value decomposition (BCSVD) has a lower bound &OHgr;(Mmin(m, n)mn); a relative savings of ~ M2. Memory usage bounds are reduced from &THgr;(M2mn) to &THgr;(Mmn); a relative savings of ~ M. For M = m = n = 64, this decreases the computation time from approximately 12 hours to 30 seconds and memory usage is reduced from 768 megabytes to 12 megabytes. The BCSVD algorithm partitions well into n macrotasks with a granularity of &THgr;(mM log M) for the FFT portion of the algorithm. The SVD portion of the algorithm partitions into M macrotasks with a granularity of &OHgr;(min(m, n)mn). Again, for the case where M = m = n = 64, the FFT granularity is 29ms and the SVD granularity is 428ms. A speedup of 3.06 was achieved by using a prescheduled partitioning of tasks. The process creation overhead was 2.63ms. Using a more elaborate self-scheduling method with four synchronizing server processes, a speedup of 3.25 was observed with four processors available. The server synchronization overhead was 0.32ms. Relative memory overhead in both cases was about 4% for data space and 40% for code space.