Interval analysis for computer graphics
SIGGRAPH '92 Proceedings of the 19th annual conference on Computer graphics and interactive techniques
Area and performance tradeoffs in floating-point divide and square-root implementations
ACM Computing Surveys (CSUR)
Design Issues in Division and Other Floating-Point Operations
IEEE Transactions on Computers
On-the-Fly Rounding (Computing Arithmetic)
IEEE Transactions on Computers
167 MHz Radix-8 Divide and Square Root Using Overlapped Radix-2 Stages
ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
Delay-Optimized Implementation of IEEE Floating-Point Addition
IEEE Transactions on Computers
Digit Selection for SRT Division and Square Root
IEEE Transactions on Computers
Digit-Recurrence Dividers with Reduced Logical Depth
IEEE Transactions on Computers
Floating-Point Fused Multiply-Add: Reduced Latency for Floating-Point Addition
ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
Computing machine-efficient polynomial approximations
ACM Transactions on Mathematical Software (TOMS)
Towards more efficient interval analysis: corner forms and a remainder interval newton method
Towards more efficient interval analysis: corner forms and a remainder interval newton method
ISVLSI '07 Proceedings of the IEEE Computer Society Annual Symposium on VLSI
Design of the ARM VFP11 Divide and Square Root Synthesisable Macrocell
ARITH '07 Proceedings of the 18th IEEE Symposium on Computer Arithmetic
PCU: the programmable culling unit
ACM SIGGRAPH 2007 papers
Verified Real Number Calculations: A Library for Interval Arithmetic
IEEE Transactions on Computers
Design of Low-Cost High-Performance Floating-Point Fused Multiply-Add with Reduced Power
VLSID '10 Proceedings of the 2010 23rd International Conference on VLSI Design
Hi-index | 0.00 |
This brief presents a dedicated unit for the combined operation of floating-point (FP) division followed by addition/subtraction--the divide-add fused (DAF). The goal of this unit is to increase the performance and the accuracy of applications where this combined operation is frequent, such as the interval Newton's method or the polynomial approximation. The proposed DAF unit presents a similar architecture to the FP multiply-accumulate units. The main difference is represented by the divider, which is implemented using digit-recurrence algorithms. An important design tradeoff regarding DAF is represented by the number of required quotient bits. We present the impact of the adopted number of quotient bits on accuracy, cost, and performance. Consequently, two implementations are proposed: one pro-accuracy and one pro-performance. We show that the proposed implementations have better accuracy with respect to the solution based on two distinct units: an FP divider and an FP adder. The implementation suitable for lower latency presents the best cost-performance tradeoff.