Matrix bidiagonalization: implementation and evaluation on the Trident processor

Authors:
Mostafa I. Soliman;Stanislav G. Sedukhin
Affiliations:
Graduate School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu City, Fukushima, 965-8580 Japan;Graduate School of Computer Science and Engineering, The University of Aizu, Aizu-Wakamatsu City, Fukushima, 965-8580 Japan
Venue:
Neural, Parallel & Scientific Computations
Year:
2003

Citing 16
Cited 2

An extended set of FORTRAN basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms

ACM Transactions on Mathematical Software (TOMS)
Data and computation transformations for multiprocessors

PPOPP '95 Proceedings of the fifth ACM SIGPLAN symposium on Principles and practice of parallel programming
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Computing the Singular-Value Decomposition on the ILLIAC IV

ACM Transactions on Mathematical Software (TOMS)
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
Trident: a scalable architecture for scalar, vector, and matrix operations

CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Solving Linear Systems on Vector and Shared Memory Computers

Solving Linear Systems on Vector and Shared Memory Computers
A Simulation Study of Decoupled Vector Architectures

The Journal of Supercomputing
Very Long Instruction Word architectures and the ELI-512

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Trident: Technology-Scalable Architecture for Data Parallel Applications

IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
The Design of a Parallel Dense Linear Algebra Software Library: Reduction to Hessenberg, Trididgonal, and Bidiagonal Form

The Design of a Parallel Dense Linear Algebra Software Library: Reduction to Hessenberg, Trididgonal, and Bidiagonal Form
Vector microprocessors

Vector microprocessors
Scalable vector media-processors for embedded systems

Scalable vector media-processors for embedded systems

A highly efficient implementation of back propagation algorithm using matrix instruction set architecture

Neural, Parallel & Scientific Computations
A highly efficient implementation of a backpropagation learning algorithm using matrix ISA

Journal of Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper discusses the parallel implementation and evaluation of the reduction of a dense matrix to bidiagonal form on the Trident processor. The standard Golub and Kahan Householder bidiagonalization algorithm, which is rich in matrix-vector operations, and the LAPACK subroutine _GEBRD, which is rich in a mixture of vector, matrix-vector, and matrix operations, are simulated on the Trident processor. We show how to use the Trident parallel execution units, ring, and communication registers to effectively perform vector, matrix-vector, and matrix operations needed for bidiagonalizing a matrix. The number of clock cycles per FLOP is used as a metric to evaluate the performance of the Trident processor. Our results show that the high-efficiency is attained by using as much as possible matrix-vector and matrix operations because of reducing the ratio of memory accesses to FLOP. On a 32K×32K matrix and 128 Trident lanes, the speedup of using matrix-vector operations on the standard Golub and Kahan algorithm over using only vector operations on one lane is around 190 times (superlinear) and on 128 lanes is around two times. However, using matrix operations on the _GEBRD subroutine gives speedup around 307 times (superlinear) over using vector operations on one lane, 3.2 times over using vector operations on 128 lanes, and 1.3 times over using matrix-vector operations.