On computing givens rotations reliably and efficiently

Authors:
David Bindel;James Demmel;William Kahan;Osni Marques
Affiliations:
University of California, Berkeley, Berkeley, CA;University of California, Berkeley, Berkeley, CA;University of California, Berkeley, Berkeley, CA;Lawrence Berkeley National Laboratory, Berkeley, CA
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
2002

Citing 9
Cited 9

The algebraic eigenvalue problem

The algebraic eigenvalue problem
Implementing complex elementary functions using exception handling

ACM Transactions on Mathematical Software (TOMS)
Matrix computations (3rd ed.)

Matrix computations (3rd ed.)
Applied numerical linear algebra

Applied numerical linear algebra
Implementing the complex arcsine and arccosine functions using exception handling

ACM Transactions on Mathematical Software (TOMS)
LAPACK Users' guide (third ed.)

LAPACK Users' guide (third ed.)
Basic Linear Algebra Subprograms for Fortran Usage

ACM Transactions on Mathematical Software (TOMS)
Faster Numerical Algorithms Via Exception Handling

IEEE Transactions on Computers
Performance Improvements to LAPACK for the Cray ScientificLibrary

Performance Improvements to LAPACK for the Cray ScientificLibrary

Algorithm 842: A set of GMRES routines for real and complex arithmetics on high performance computers

ACM Transactions on Mathematical Software (TOMS)
A unitary Hessenberg QR-based algorithm via semiseparable matrices

Journal of Computational and Applied Mathematics
Complex Square Root with Operand Prescaling

Journal of VLSI Signal Processing Systems
Algorithm 881: A Set of Flexible GMRES Routines for Real and Complex Arithmetics on High-Performance Computers

ACM Transactions on Mathematical Software (TOMS)
QR decomposition on GPUs

Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
A unitary Hessenberg QR-based algorithm via semiseparable matrices

Journal of Computational and Applied Mathematics
3-D target-based distributed smart camera network localization

IEEE Transactions on Image Processing - Special section on distributed camera networks: sensing, processing, communication, and implementation
A note on shifted Hessenberg systems and frequency response computation

ACM Transactions on Mathematical Software (TOMS)
Soft error resilient QR factorization for hybrid system with GPGPU

Proceedings of the second workshop on Scalable algorithms for large-scale systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

We consider the efficient and accurate computation of Givens rotations. When f and g are positive real numbers, this simply amounts to computing the values of c = f/√f2 + g2, s = g/√f2 + g2, and r = √f2 + g2. This apparently trivial computation merits closer consideration for the following three reasons. First, while the definitions of c, s and r seem obvious in the case of two nonnegative arguments f and g, there is enough freedom of choice when one or more of f and g are negative, zero or complex that LAPACK auxiliary routines SLARTG, CLARTG, SLARGV and CLARGV can compute rather different values of c, s and r for mathematically identical values of f and g. To eliminate this unnecessary ambiguity, the BLAS Technical Forum chose a single consistent definition of Givens rotations that we will justify here. Second, computing accurate values of c, s and r as efficiently as possible and reliably despite over/underflow is surprisingly complicated. For complex Givens rotations, the most efficient formulas require only one real square root and one real divide (as well as several much cheaper additions and multiplications), but a reliable implementation using only working precision has a number of cases. On a Sun Ultra-10, the new implementation is slightly faster than the previous LAPACK implementation in the most common case, and 2.7 to 4.6 times faster than the corresponding vendor, reference or ATLAS routines. It is also more reliable; all previous codes occasionally suffer from large inaccuracies due to over/underflow. For real Givens rotations, there are also improvements in speed and accuracy, though not as striking. Third, the design process that led to this reliable implementation is quite systematic, and could be applied to the design of similarly reliable subroutines.