Matrix bidiagonalization: implementation and evaluation on the Trident processor

  • Authors:
  • Mostafa I. Soliman;Stanislav G. Sedukhin

  • Affiliations:
  • Graduate School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu City, Fukushima, 965-8580 Japan;Graduate School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu City, Fukushima, 965-8580 Japan

  • Venue:
  • Neural, Parallel & Scientific Computations
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses the parallel implementation and evaluation of the reduction of a dense matrix to bidiagonal form on the Trident processor. The standard Golub and Kahan Householder bidiagonalization algorithm, which is rich in matrix-vector operations, and the LAPACK subroutine _GEBRD, which is rich in a mixture of vector, matrix-vector, and matrix operations, are simulated on the Trident processor. We show how to use the Trident parallel execution units, ring, and communication registers to effectively perform vector, matrix-vector, and matrix operations needed for bidiagonalizing a matrix. The number of clock cycles per FLOP is used as a metric to evaluate the performance of the Trident processor. Our results show that the high-efficiency is attained by using as much as possible matrix-vector and matrix operations because of reducing the ratio of memory accesses to FLOP. On a 32K×32K matrix and 128 Trident lanes, the speedup of using matrix-vector operations on the standard Golub and Kahan algorithm over using only vector operations on one lane is around 190 times (superlinear) and on 128 lanes is around two times. However, using matrix operations on the _GEBRD subroutine gives speedup around 307 times (superlinear) over using vector operations on one lane, 3.2 times over using vector operations on 128 lanes, and 1.3 times over using matrix-vector operations.