An extended set of FORTRAN basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
A set of level 3 basic linear algebra subprograms
ACM Transactions on Mathematical Software (TOMS)
Computational methods for linear control systems
Computational methods for linear control systems
LAPACK's user's guide
Matrix computations (3rd ed.)
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
FLAME: Formal Linear Algebra Methods Environment
ACM Transactions on Mathematical Software (TOMS)
Accuracy and Stability of Numerical Algorithms
Accuracy and Stability of Numerical Algorithms
A Note On Parallel Matrix Inversion
SIAM Journal on Scientific Computing
A Portable Subroutine Library for Solving Linear Control Problems on Distributed Memory Computers
Workshop on Wide Area Networks and High Performance Computing
State-space truncation methods for parallel model reduction of large-scale systems
Parallel Computing - Special issue: Parallel and distributed scientific and engineering computing
The science of deriving dense linear algebra algorithms
ACM Transactions on Mathematical Software (TOMS)
Solving linear-quadratic optimal control problems on parallel computers
Optimization Methods & Software
Accelerating model reduction of large linear systems with graphics processors
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
GPU acceleration of the caffa3d.MB model
ICCSA'12 Proceedings of the 12th international conference on Computational Science and Its Applications - Volume Part IV
Accelerating BST methods for model reduction with graphics processors
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Towards a finite volume model on a many-core platform
International Journal of High Performance Systems Architecture
Hi-index | 0.00 |
We investigate the numerical computation of the matrix sign function of large-scale dense matrices. This is a common task in various application areas. The main computational work in Newton's iteration for the matrix sign function consits of matrix inversion. Therefore, we investigate the performance of two approaches for matrix inversion based on Gaussian (LU factorization) and Gauss-Jordan eliminations. The target architecture is a current general-purpose multi-core processor connected to a graphics processor. Parallelism is extracted in both processors by linking sequential versions of the codes with multithreaded implementations of BLAS. Our results on a system with two Intel Quad-Core processors and an NVIDIA Tesla C1060 illustrate the performance and scalability attained by the codes on this system.