The WY representation for products of householder matrices
SIAM Journal on Scientific and Statistical Computing - Papers from the Second Conference on Parallel Processing for Scientific Computin
A fully parallel algorithm for the symmetric eigenvalue problem
SIAM Journal on Scientific and Statistical Computing
A storage-efficient WY representation for products of householder transformations
SIAM Journal on Scientific and Statistical Computing
On the orthogonality of eigenvectors computed by divide-and-conquer techniques
SIAM Journal on Numerical Analysis
A parallel algorithm for reducing symmetric banded matrices to tridiagonal form
SIAM Journal on Scientific Computing
A Divide-and-Conquer Algorithm for the Symmetric TridiagonalEigenproblem
SIAM Journal on Matrix Analysis and Applications
Applied numerical linear algebra
Applied numerical linear algebra
The symmetric eigenvalue problem
The symmetric eigenvalue problem
A new O (N(2)) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem
A new O (N(2)) algorithm for the symmetric tridiagonal eigenvalue/eigenvector problem
Using Level 3 BLAS in Rotation-Based Algorithms
SIAM Journal on Scientific Computing
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
SIAM Journal on Scientific Computing
Efficient eigenvalue and singular value computations on shared memory machines
Parallel Computing - Special issue on parallelization techniques for numerical modelling
A framework for symmetric band reduction
ACM Transactions on Mathematical Software (TOMS)
Algorithm 807: The SBR Toolbox—software for successive band reduction
ACM Transactions on Mathematical Software (TOMS)
A Serial Implementation of Cuppen''s Divide and Conquer Algorithm
A Serial Implementation of Cuppen''s Divide and Conquer Algorithm
LAPACK Working Note 89: Solving Secular Equations Stably and Efficiently
LAPACK Working Note 89: Solving Secular Equations Stably and Efficiently
A Portable Programming Interface for Performance Evaluation on Modern Processors
International Journal of High Performance Computing Applications
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
The design and implementation of the MRRR algorithm
ACM Transactions on Mathematical Software (TOMS)
Algorithm 880: A testing infrastructure for symmetric tridiagonal eigensolvers
ACM Transactions on Mathematical Software (TOMS)
Performance and Accuracy of LAPACK's Symmetric Tridiagonal Eigensolvers
SIAM Journal on Scientific Computing
Accelerating linpack with CUDA on heterogenous clusters
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
ACM Transactions on Mathematical Software (TOMS)
Towards dense linear algebra for hybrid GPU accelerated manycore systems
Parallel Computing
Reduction to condensed forms for symmetric eigenvalue problems on multi-core architectures
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
An Improved Magma Gemm For Fermi Graphics Processing Units
International Journal of High Performance Computing Applications
Toward a scalable multi-GPU eigensolver via compute-intensive kernels and efficient communication
Proceedings of the 27th international ACM conference on International conference on supercomputing
Hi-index | 0.00 |
With the raw computing power of graphics processing units (GPUs) being more widely available in commodity multicore systems, there is an imminent need to harness their power for important numerical libraries such as LAPACK. In this paper, we consider the solution of dense symmetric and Hermitian eigenproblems by the LAPACK divide and conquer algorithm on such modern heterogeneous systems. We focus on how to make the best use of the individual strengths of the massively parallel manycore GPUs and multicore CPUs. The resulting algorithm overcomes performance bottlenecks faced by current implementations that are optimized for a homogeneous multicore. On a dual socket quad-core Intel Xeon 2.33 GHz with an NVIDIA GTX 280 GPU, we typically obtain up to about a tenfold improvement in performance for the complete dense problem. The techniques described here thus represent an example of how to develop numerical software to efficiently use heterogeneous architectures. As heterogeneity becomes more common in the architecture design, the significance of and need for this work are expected to grow.