On accurate floating-point summation

  • Authors:
  • Michael A. Malcolm

  • Affiliations:
  • Stanford Univ., CA

  • Venue:
  • Communications of the ACM
  • Year:
  • 1971

Quantified Score

Hi-index 48.23

Visualization

Abstract

cumulation of floating-point sums is considered on a computer which performs t-digit base &bgr; floating-point addition with exponents in the range —m to M. An algorithm is given for accurately summing n t-digit floating-point numbers. Each of these n numbers is split into q parts, forming q·n t-digit floating-point numbers. Each of these is then added to the appropriate one of &eegr; auxiliary t-digit accumulators. Finally, the accumulators are added together to yield the computed sum. In all, q·n + &eegr; - 1 t-digit floating-point additions are performed. Let &ngr; = ⌈(M + m + 1)/(&eegr; + 1)⌉. If n ≤ (1/q)&bgr;⌈((q-1)/q)t⌈-&ngr;+1 (*), then the relative error in the computed sum is at most ⌈(t + 1)/&ngr;⌉&bgr;1-t. Further, with an additional q + &eegr; - 1 t-digit additions, the computed sum can be corrected to full t-digit accuracy.For example, for the IBM/360 (&bgr; = 16, t = 14, M = 63, m = 64), typical values for q and &eegr; are q = 2 and &eegr; = 32. In this case, (*) becomes n ≤ 1/2 × 164 = 32,768, and we have ⌈(t + 1)/&ngr;⌉&bgr;1-t = 4 × 16-13.