1-GHz HAL SPARC64® Dual Floating Point Unit with RAS Features

Authors:
Ajay Naini;Atul Dhablania;Warren James;Debjit Das Sarma
Affiliations:
-;-;-;-
Venue:
ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Year:
2001

Citing 0
Cited 10

Implementation of the Exponential Function in a Floating-Point Unit

Journal of VLSI Signal Processing Systems
Delay-Optimized Implementation of IEEE Floating-Point Addition

IEEE Transactions on Computers
FPU Implementations with Denormalized Numbers

IEEE Transactions on Computers
Dual-mode floating-point multiplier architectures with parallel operations

Journal of Systems Architecture: the EUROMICRO Journal
Dual-mode floating-point adder architectures

Journal of Systems Architecture: the EUROMICRO Journal
Fast, Efficient Floating-Point Adders and Multipliers for FPGAs

ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Prenormalization rounding in IEEE floating-point operations using a flagged prefix adder

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Bridge floating-point fused multiply-add design

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Quasi-delay-insensitive computing device: methodological aspects and practical implementation

PATMOS'09 Proceedings of the 19th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Speculative hardware/software co-designed floating-point multiply-add fusion

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems

Quantified Score

Hi-index	0.01

Visualization

Abstract

Abstract: An IEEE compliant, 1 GHz Sparc64-V Floating-Point Unit (FPU) with reliability-accessibility-serviceability (RAS) features and partial support for denormal operands and results is presented. The FPU has two functional units, each with an adder (FADD) and a multiplier (FMUL). Additionally, one of the functional units contains a graphics unit (VIS). Two floating-point instructions can be scheduled out of order each cycle, one to each functional unit. A peak performance of 4 GFLOP is achieved by scheduling two floating-point multiply add (FMA) instructions each cycle. The FADD unit is fully pipelined and can execute an addition, subtraction, conversion, or compare instruction every cycle. The FMUL unit executes pipelined multiply instructions. Divide and square-root instructions are executed with multiple iterations through the multiplier pipeline. The VIS unit is also pipelined and executes SIMD fixed-point graphics instructions. The adder and multiplier have latencies of 3 and 4 cycles respectively. Novel techniques are presented in the adder and multiplier implementations to reduce area and cycle time. The FPU provides RAS support for enhanced server reliability by using selective parity error detection. The FPU has been implemented in 0.15u, 6-layer metal CMOS technology.