Area-efficient architectures for double precision multiplier on FPGA, with run-time-reconfigurable dual single precision support

  • Authors:
  • Manish Kumar Jaiswal;Ray C. C. Cheung

  • Affiliations:
  • Department of Electronic Engineering, City University of Hong Kong, Hong Kong;Department of Electronic Engineering, City University of Hong Kong, Hong Kong

  • Venue:
  • Microelectronics Journal
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Floating point arithmetic (FPA) is a crucial basic building block in many application domains such as scientific, numerical and signal processing applications. Multiplication is one of the most commonly used one in FPA. This paper presents three architectures targeting Double Precision (D.P.) multiplier, with one being capable of performing run-time-reconfigurable (RTR) dual Single Precision (S.P.) multiplication operation. The first design is based on a novel block-level truncated multiplication, which is able to reduce 1/3 of multiplier blocks with high performance, and is within 1-ULP (unit in the last place) precision from IEEE-754 floating-point standard precision. The second design regains the accuracy lost from the first design, with the same amount of multiplier blocks but some extra hardware, is also able to achieve better performance with less latency than existing work. The third architecture in this paper is able to perform either, with the single double (extended) precision or dual single (extended) precision operands, without any pipeline stall, and with attractive area, speed and latency results. The first design is suitable for the applications with slightly less precision requirement, whereas the other two designs are fully compatible to the IEEE standard accuracy. Design-1 is able to achieve around 300MHz and 450MHz on Virtex-4 (V4) and Virtex-5 (V5), respectively, with only 6 DSP48, and latency of 9 cycles. Design-2 is capable of achieving about 325MHz (V4) and 400MHz (V5), with only 6 DSP48, with full precision support. The third design achieves more than 250MHz (V4) and 325MHz (V5) speed, providing on-the-fly dual precision support, with hardware requirement similar to only double precision supported implementations in the literature. Promising results are obtained by comparing the proposed designs with the best reported floating point multipliers in the literature.