Low-Power Multiple-Precision Iterative Floating-Point Multiplier with SIMD Support

Authors:
Dimitri Tan;Carl E. Lemonds;Michael J. Schulte
Affiliations:
Advanced Micro Devices Inc., Austin;Advanced Micro Devices Inc., Austin;University of Wisconsin-Madison, Madison
Venue:
IEEE Transactions on Computers
Year:
2009

Citing 0
Cited 5

Energy-Efficient Multiple-Precision Floating-Point Multiplier for Embedded Applications

Journal of Signal Processing Systems
Efficient multimedia coprocessor with enhanced SIMD engines for exploiting ILP and DLP

Parallel Computing
An exact method for estimating maximum errors of multi-mode floating-point iterative booth multiplier

International Journal of Computational Science and Engineering
Performance effects of pipeline architecture on an FPGA-based binary32 floating point multiplier

Microprocessors & Microsystems
Ultra-low-power adder stage design for exascale floating point units

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	14.98

Visualization

Abstract

The demand for improved SIMD floating-point performance on general-purpose x86-compatible microprocessors is rising. At the same time, there is a conflicting demand in the low-power computing market for a reduction in power consumption. Along with this, there is the absolute necessity of backward compatibility for x86-compatible microprocessors, which includes the support of x87 scientific floating-point instructions. The combined effect is that there is a need for low-power, low-cost floating-point units that are still capable of delivering good SIMD performance while maintaining full x86 functionality. This paper presents the design of an x86-compatible floating-point multiplier (FPM) that is compliant with the IEEE-754 Standard for Binary Floating-Point Arithmetic [12] and is specifically tailored to provide good SIMD performance in a low-cost, low-power solution while maintaining full x87 backward compatibility. The FPM efficiently supports multiple precisions using an iterative rectangular multiplier. The FPM can perform two parallel single-precision multiplies every cycle with a latency of two cycles, one double-precision multiply every two cycles with a latency of four cycles, or one extended-double-precision multiply every three cycles with a latency of five cycles. The iterative FPM also supports division, square-root, and transcendental functions. Compared to a previous design with similar functionality, the proposed iterative FPM has 60 percent less area and 59 percent less dynamic power dissipation.