An integrated reduction technique for a double precision accumulator

  • Authors:
  • Krishna K. Nagar;Yan Zhang;Jason D. Bakos

  • Affiliations:
  • University of South Carolina, Columbia, SC;University of South Carolina, Columbia, SC;University of South Carolina, Columbia, SC

  • Venue:
  • Proceedings of the Third International Workshop on High-Performance Reconfigurable Computing Technology and Applications
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

The accumulation operation, An+1 = An + X, is perhaps one of the most fundamental and widely-used operations in numerical mathematics and digital signal processing. However, designing double-precision floating-point accumulators presents a unique set of challenges: double-precision addition is usually deeply pipelined and without special micro-architectural or data scheduling techniques, the data hazard that exists between An+1 and An requires that each new value of X delivered to the accumulator wait for the latency of the adder. There have been several techniques proposed for alleviating this problem, but each carries significant overheads and/or restrictions on input characteristics. In this paper we present a design for a double precision accumulator that requires no timing overhead relative to the underlying add operation. We achieve this by integrating a coalescing reduction circuit within the low-level design of a base-converting floating-point adder. To demonstrate our accumulator design, we use it in a sparse matrix vector multiplication architecture, achieving a throughput of up to 3.7 GFLOPS.