Ultimately Fast Accurate Summation

Authors:
Siegfried M. Rump
Affiliations:
rump@tu-harburg.de
Venue:
SIAM Journal on Scientific Computing
Year:
2009

Citing 0
Cited 7

Algorithm 908: Online Exact Summation of Floating-Point Streams

ACM Transactions on Mathematical Software (TOMS)
Accuracy versus time: a case study with summation algorithms

Proceedings of the 4th International Workshop on Parallel and Symbolic Computation
Error-free transformations of matrix multiplication by using fast routines of matrix multiplication and its applications

Numerical Algorithms
PerPI: a tool to measure instruction level parallelism

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume Part I
Comments on fast and exact accumulation of products

PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Accurate solution of dense linear systems, part I: Algorithms in rounding to nearest

Journal of Computational and Applied Mathematics
Self-Alignment Schemes for the Implementation of Addition-Related Floating-Point Operators

ACM Transactions on Reconfigurable Technology and Systems (TRETS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present two new algorithms FastAccSum and FastPrecSum, one to compute a faithful rounding of the sum of floating-point numbers and the other for a result “as if” computed in $K$-fold precision. Faithful rounding means the computed result either is one of the immediate floating-point neighbors of the exact result or is equal to the exact sum if this is a floating-point number. The algorithms are based on our previous algorithms AccSum and PrecSum and improve them by up to 25%. The first algorithm adapts to the condition number of the sum; i.e., the computing time is proportional to the difficulty of the problem. The second algorithm does not need extra memory, and the computing time depends only on the number of summands and $K$. Both algorithms are the fastest known in terms of flops. They allow good instruction-level parallelism so that they are also fast in terms of measured computing time. The algorithms require only standard floating-point addition, subtraction, and multiplication in one working precision, for example, double precision.