The algebraic eigenvalue problem
The algebraic eigenvalue problem
Implementing complex elementary functions using exception handling
ACM Transactions on Mathematical Software (TOMS)
Matrix computations (3rd ed.)
Applied numerical linear algebra
Applied numerical linear algebra
Implementing the complex arcsine and arccosine functions using exception handling
ACM Transactions on Mathematical Software (TOMS)
LAPACK Users' guide (third ed.)
LAPACK Users' guide (third ed.)
Basic Linear Algebra Subprograms for Fortran Usage
ACM Transactions on Mathematical Software (TOMS)
Faster Numerical Algorithms Via Exception Handling
IEEE Transactions on Computers
Performance Improvements to LAPACK for the Cray ScientificLibrary
Performance Improvements to LAPACK for the Cray ScientificLibrary
ACM Transactions on Mathematical Software (TOMS)
A unitary Hessenberg QR-based algorithm via semiseparable matrices
Journal of Computational and Applied Mathematics
Complex Square Root with Operand Prescaling
Journal of VLSI Signal Processing Systems
ACM Transactions on Mathematical Software (TOMS)
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
A unitary Hessenberg QR-based algorithm via semiseparable matrices
Journal of Computational and Applied Mathematics
3-D target-based distributed smart camera network localization
IEEE Transactions on Image Processing - Special section on distributed camera networks: sensing, processing, communication, and implementation
A note on shifted Hessenberg systems and frequency response computation
ACM Transactions on Mathematical Software (TOMS)
Soft error resilient QR factorization for hybrid system with GPGPU
Proceedings of the second workshop on Scalable algorithms for large-scale systems
Hi-index | 0.00 |
We consider the efficient and accurate computation of Givens rotations. When f and g are positive real numbers, this simply amounts to computing the values of c = f/√f2 + g2, s = g/√f2 + g2, and r = √f2 + g2. This apparently trivial computation merits closer consideration for the following three reasons. First, while the definitions of c, s and r seem obvious in the case of two nonnegative arguments f and g, there is enough freedom of choice when one or more of f and g are negative, zero or complex that LAPACK auxiliary routines SLARTG, CLARTG, SLARGV and CLARGV can compute rather different values of c, s and r for mathematically identical values of f and g. To eliminate this unnecessary ambiguity, the BLAS Technical Forum chose a single consistent definition of Givens rotations that we will justify here. Second, computing accurate values of c, s and r as efficiently as possible and reliably despite over/underflow is surprisingly complicated. For complex Givens rotations, the most efficient formulas require only one real square root and one real divide (as well as several much cheaper additions and multiplications), but a reliable implementation using only working precision has a number of cases. On a Sun Ultra-10, the new implementation is slightly faster than the previous LAPACK implementation in the most common case, and 2.7 to 4.6 times faster than the corresponding vendor, reference or ATLAS routines. It is also more reliable; all previous codes occasionally suffer from large inaccuracies due to over/underflow. For real Givens rotations, there are also improvements in speed and accuracy, though not as striking. Third, the design process that led to this reliable implementation is quite systematic, and could be applied to the design of similarly reliable subroutines.