Comparison of Single- and Dual-Pass Multiply-Add Fused Floating-Point Units

Authors:
Romesh M. Jessani;Michael Putrino
Affiliations:
Ross Technology, Inc., Austin, TX;IBM Corp., Austin, TX
Venue:
IEEE Transactions on Computers
Year:
1998

Citing 9
Cited 8

A General Proof for Overlapped Multiple-Bit Scanning Multiplications

IEEE Transactions on Computers
Design of the IBM RISC System/6000 floating-point execution unit

IBM Journal of Research and Development
Hard-Wired Multipliers with Encoded Partial Products

IEEE Transactions on Computers
Improving multiplier design by using improved column compression tree and optimized final adder in CMOS technology

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The floating-point unit of the PowerPC 603e microprocessor

IBM Journal of Research and Development
Design strategies for optimal hybrid final adders in a parallel multiplier

Journal of VLSI Signal Processing Systems - Special issue on VLSI arithmetic and implementations
A Method for Speed Optimized Partial Product Reduction and Generation of Fast Parallel Multipliers Using an Algorithmic Approach

IEEE Transactions on Computers
Design strategies for the final adder in a parallel multiplier

ASILOMAR '95 Proceedings of the 29th Asilomar Conference on Signals, Systems and Computers (2-Volume Set)
Design Strategies for Optimal Multiplier Circuits

ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic

Cost-Conscious Strategies to Increase Performance of Numerical Programs on Aggressive VLIW Architectures

IEEE Transactions on Computers
Further Reducing the Redundancy of a Notation Over a Minimally Redundant Digit Set

Journal of VLSI Signal Processing Systems
Prospects for Simulated Annealing Algorithms in Automatic Differentiation

SAGA '01 Proceedings of the International Symposium on Stochastic Algorithms: Foundations and Applications
A low cost, multithreaded processing-in-memory system

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Multi-functional floating-point MAF designs with dot product support

Microelectronics Journal
Bridge floating-point fused multiply-add design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Sabrewing: A lightweight architecture for combined floating-point and integer arithmetic

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Ultra-low-power adder stage design for exascale floating point units

ACM Transactions on Embedded Computing Systems (TECS) - Special Issue on Design Challenges for Many-Core Processors, Special Section on ESTIMedia'13 and Regular Papers

Quantified Score

Hi-index	14.98

Visualization

Abstract

Low power, low cost, and high performance factors dictate the design of many microprocessors targeted to the low-power computing market. The floating-point unit occupies a significant percentage of the silicon area in a microprocessor due its wide data bandwidth (for double-precision computations) and the area occupied by the multiply array. For microprocessors designed for portable products, the design-size of the floating-point unit plays an important role in the low cost factor driven by reduced chip area. Some microprocessors have multiply-add fused floating-point units with a reduced multiply array, requiring two passes through the array for operations involving double-precision multiplies. This paper discusses the design complexities around the dual-pass multiply array and its effect on area and performance. Floating-point unit areas and their associated multiply array areas are compared for a single- and dual-pass implementation in a given technology (PowerPC 604eTM and PowerPC 603eTM microprocessors, respectively).