Implementing a parallel matrix factorization library on the cell broadband engine
Scientific Programming - High Performance Computing with the Cell Broadband Engine
Scheduling two-sided transformations using tile algorithms on multicore architectures
Scientific Programming
High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures
ACM Transactions on Mathematical Software (TOMS)
Hi-index | 0.00 |
Two new algorithms for one-sided bidiagonalization are presented. The first is a block version which improves execution time by improving cache utilization from the use of BLAS 2.5 operations and more BLAS 3 operations. The second is adapted to parallel computation. When incorporated into singular value decomposition software, the second algorithm is faster than the corresponding ScaLAPACK routine in most cases. An error analysis is presented for the first algorithm. Numerical results and timings are presented for both algorithms.