Memory hierarchy exploration for accelerating the parallel computation of SVDs

  • Authors:
  • Mostafa I. Soliman

  • Affiliations:
  • Computer & System Section, Electrical Engineering Department, Faculty of Engineering, South Valley University, Aswan, Egypt

  • Venue:
  • Neural, Parallel & Scientific Computations
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

The performance of many applications on modern computers is often limited by memory latency rather than by processor speed. For computers with memory hierarchy, it is preferable to perform the computation on blocks of data to reduce the impact of memory latency by reusing the loaded data in cache memories. This paper proposes a fast algorithm for parallel computing the extremely useful singular value decomposition (SVD) based on one-sided Jacobi on multi-level memory hierarchy architectures. On Pparallel processors, the given matrix is divided into super-rows and then these super-rows are partitioned into 2Pblocks. One key point of the proposed algorithm is the highly exploitation of memory hierarchy by performing all computations on super-rows loaded in cache memory rather than on rows. Another key point is that the number of sweeps required for convergence is very close to cyclic one-sided Jacobi. Third key point of the proposed algorithm is that the number of sweeps required for convergence does not depend drastically on the size of the input matrix. On two dual-core Intel Xeon processors, our results show that the performance of parallel implementation of the proposed algorithm is around 11 times higher than the sequential implementation on the same hardware. Moreover, a performance of around 10 GFLOPS (double-precision) can be achieved on the target system using multi-threading, Intel SIMD instructions, matrix blocking, and loop unrolling techniques.