An FPGA implementation of pipelined multiplicative division with IEEE Rounding

  • Authors:
  • Ronen Goldberg;Guy Even;Peter-M. Seidel

  • Affiliations:
  • Tel-Aviv University, Israel;Tel-Aviv University, Israel;Southern Methodist University, USA

  • Venue:
  • FCCM '07 Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

We report the results of an FPGA implementation of double precision floating-point division with IEEE rounding. We achieve a total latency (i.e., cycles times clock period) that is 2:6 times smaller than the latency of the fastest previous implementation on FPGAs. The amount of hardware, on the other hand, is comparable to commercial cores. The division circuit is based on Goldschmidt's algorithm. All IEEE rounding modes are supported and are implemented using dewpoint rounding. The precision of the initial approximation of the reciprocal is 14 bits. To save hardware and reduce the critical path, a half-sized 62x30 Booth radix-8 multiplier is used. This multiplier can receive both the multiplicand and the multiplier in carry-save representation. The division circuit is partitioned into four pipeline stages, has a latency of 11 cycles, and may restart a new double precision division operation after 8 cycles. Synthesis results of an implementation (not including the computation of the initial approximation of the reciprocal and the exponent path) guarantee a clock frequency of 131 MHz on an Altera Stratix II using 3592 ALMs. The implementation was successfully tested with over 10 million random vectors as well as over a million hard-to-round vectors.