Exact floating-point summation and its application in shadowing

  • Authors:
  • Wayne B. Hayes;Yong-Kang Zhu

  • Affiliations:
  • University of California, Irvine;University of California, Irvine

  • Venue:
  • Exact floating-point summation and its application in shadowing
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The summation of n floating-point numbers is a basic task that is performed in almost any scientific or numerical computation. The relative error in the "naive loop" summation can be huge. We present three new algorithms for computing correctly rounded sums of arrays of floating-point numbers. First, iFastSum improves upon our previous FastSum by requiring no additional space beyond the original array, which is destroyed after execution. It runs about 20% faster than FastSum in the general case and two times faster when ill-conditioned data are used. The second algorithm is HybridSum, which combines three summation ideas together: splitting the mantissa, radix sorting into buckets, and using iFastSum. The result is that when the number of summands is greater than about 104, for a given n its running time is almost a constant, independent of the condition number. It runs almost as fast as iFastSum in the general case and much faster than iFastSum when ill-conditioned data are used. HybridSum requires only one pass through the input array, asymptotically uses only 6 FLOPs for each summand, and uses constant storage. Instead of splitting the mantissa in HybridSum, our third algorithm OnlineExactSum doubles the number of buckets and uses the exact addition algorithm to accumulate the summands while the local errors are handled by the additional buckets. It asymptotically requires only 5 FLOPs per summand, and due to instruction-level parallelism runs only about 2-3 times slower than the obvious, fast-but-dumb "naive loop", independent of the condition number. Like HybridSum, it is also an online algorithm, which can take an arbitrary length input stream of such inputs while requiring only constant memory. None of these algorithms requires extra precision accumulators, and all work in any base. Their accuracy is guaranteed independent of the condition number and the number of summands. An application in shadowing of high dimensional n-body simulations is discussed with new optimizations and improvements made to an existing shadowing algorithm. We also present a large amount of numerical results about the shadowing work we have done on the n-body systems. Although not all of those results are understood thoroughly so far, they might be useful for other physicists and astronomers.