Matrix computations (3rd ed.)
Algorithm 807: The SBR Toolbox—software for successive band reduction
ACM Transactions on Mathematical Software (TOMS)
SIAM Journal on Scientific Computing
The design and implementation of the MRRR algorithm
ACM Transactions on Mathematical Software (TOMS)
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Solving Dense Linear Systems on Graphics Processors
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Accelerating model reduction of large linear systems with graphics processors
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Divide and Conquer on Hybrid GPU-Accelerated Multicore Systems
SIAM Journal on Scientific Computing
Accelerating BST methods for model reduction with graphics processors
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
High-performance bidiagonal reduction using tile algorithms on homogeneous multicore architectures
ACM Transactions on Mathematical Software (TOMS)
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
We investigate the performance of the routines in LAPACK and the Successive Band Reduction (SBR) toolbox for the reduction of a dense matrix to tridiagonal form, a crucial preprocessing stage in the solution of the symmetric eigenvalue problem, on general-purpose multicore processors. In response to the advances of hardware accelerators, we also modify the code in SBR to accelerate the computation by offloading a significant part of the operations to a graphics processor (GPU). Performance results illustrate the parallelism and scalability of these algorithms on current high-performance multi-core architectures.