The WY representation for products of householder matrices
SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Implementation of the GMRES method using householder transformations
SIAM Journal on Scientific and Statistical Computing - Telecommunication Programs at U.S. Universities
A storage-efficient WY representation for products of householder transformations
SIAM Journal on Scientific and Statistical Computing
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Modification of the householder method based on the compact WY representation
SIAM Journal on Scientific and Statistical Computing
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Efficient eigenvalue and singular value computations on shared memory machines
Parallel Computing - Special issue on parallelization techniques for numerical modelling
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Algorithm 807: The SBR Toolbox—software for successive band reduction
ACM Transactions on Mathematical Software (TOMS)
Matrix algorithms
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
Solving Linear Systems on Vector and Shared Memory Computers
Solving Linear Systems on Vector and Shared Memory Computers
A Note On Parallel Matrix Inversion
SIAM Journal on Scientific Computing
Aggregations of Elementary Transformations
Aggregations of Elementary Transformations
LAPACK Working Note 72: The Computation of Elementary Unitary Matrices
LAPACK Working Note 72: The Computation of Elementary Unitary Matrices
The science of deriving dense linear algebra algorithms
ACM Transactions on Mathematical Software (TOMS)
Representing linear algebra algorithms in code: the FLAME application program interfaces
ACM Transactions on Mathematical Software (TOMS)
Accumulating Householder transformations, revisited
ACM Transactions on Mathematical Software (TOMS)
Improving the performance of reduction to Hessenberg form
ACM Transactions on Mathematical Software (TOMS)
Cache efficient bidiagonalization using BLAS 2.5 operators
ACM Transactions on Mathematical Software (TOMS)
Condensed forms for the symmetric eigenvalue problem on multi-threaded architectures
Concurrency and Computation: Practice & Experience
The libflame Library for Dense Matrix Computations
Computing in Science and Engineering
Hi-index | 0.00 |
In a recent paper it was shown how memory traffic can be diminished by reformulating the classic algorithm for reducing a matrix to bidiagonal form, a preprocess when computing the singular values of a dense matrix. The key is a reordering of the computation so that the most memory-intensive operations can be “fused.” In this article, we show that other operations that reduce matrices to condensed form (reduction to upper Hessenberg form and reduction to tridiagonal form) can be similarly reorganized, yielding different sets of operations that can be fused. By developing the algorithms with a common framework and notation, we facilitate the comparing and contrasting of the different algorithms and opportunities for optimization on sequential architectures. We discuss the algorithms, develop a simple model to estimate the speedup potential from fusing, and showcase performance improvements consistent with the what the model predicts.