High-Speed Double-Precision Computation of Reciprocal, Division, Square Root and Inverse Square Root

Authors:
José-Alejandro Piñeiro;Javier Díaz Bruguera
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
2002

Citing 20
Cited 9

Evaluating Elementary Functions in a Numerical Coprocessor Based on Rational Approximations

IEEE Transactions on Computers
Area and performance tradeoffs in floating-point divide and square-root implementations

ACM Computing Surveys (CSUR)
Design Issues in Division and Other Floating-Point Operations

IEEE Transactions on Computers
Elementary functions: algorithms and implementation

Elementary functions: algorithms and implementation
Maple V: programming guide

Maple V: programming guide
Powering by a Table Look-Up and a Multiplication with Operand Modification

IEEE Transactions on Computers
Very High Radix Square Root with Prescaling and Rounding and a Combined Division/Square Root Unit

IEEE Transactions on Computers
Reciprocation, Square Root, Inverse Square Root, and Some Elementary Functions Using Small Multipliers

IEEE Transactions on Computers - Special issue on computer arithmetic
Improving Goldschmidt Division, Square Root, and Square Root Reciprocal

IEEE Transactions on Computers - Special issue on computer arithmetic
Division and Square Root: Digit-Recurrence Algorithms and Implementations

Division and Square Root: Digit-Recurrence Algorithms and Implementations
Fast Hardware-Based Algorithms for Elementary Function Computations Using Rectangular Multipliers

IEEE Transactions on Computers
Efficient Initial Approximation and Fast Converging Methods for Division and Square Root

ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
Faithful Bipartite ROM Reciprocal Tables

ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
Cascaded Implementation of an Iterative Inverse--Square--Root Algorithm, with Overflow Lookahead

ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
Redundant Binary Booth Recoding

ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
SRT Division Architectures and Implementations

ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
High-Performance Hardware for Function Generation

ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Symmetric Bipartite Tables for Accurate Function Approximation

ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Floating Point Division and Square Root Algorithms and Implementation in the AMD-K7 Microprocessor

ARITH '99 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
Faithful Powering Computation Using Table Look-Up and a Fused Accumulation Tree

ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic

Analysis of the impact of different methods for division/square root computation in the performance of a superscalar microprocessor

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
A Cost-Effective Pipelined Divider with a Small Lookup Table

IEEE Transactions on Computers
Algorithm and Architecture for Logarithm, Exponential, and Powering Computation

IEEE Transactions on Computers
High-Speed Function Approximation Using a Minimax Quadratic Interpolator

IEEE Transactions on Computers
High-Radix Logarithm with Selection by Rounding: Algorithm and Implementation

Journal of VLSI Signal Processing Systems
Real-time arithmetic unit

Real-Time Systems
Partial product reduction by using look-up tables for M×N multiplier

Integration, the VLSI Journal
Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores

Asilomar'09 Proceedings of the 43rd Asilomar conference on Signals, systems and computers
A goldschmidt division method with faster than quadratic convergence

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Quantified Score

Hi-index	14.99

Visualization

Abstract

A new method for the high-speed computation of double-precision floating-point reciprocal, division, square root, and inverse square root operations is presented in this paper. This method employs a second-degree minimax polynomial approximation to obtain an accurate initial estimate of the reciprocal and the inverse square root values, and then performs a modified Goldschmidt iteration. The high accuracy of the initial approximation allows us to obtain double-precision results by computing a single Goldschmidt iteration, significantly reducing the latency of the algorithm. Two unfolded architectures are proposed: the first one computing only reciprocal and division operations, and the second one also including the computation of square root and inverse square root. The execution times and area costs for both architectures are estimated, and a comparison with other multiplicative-based methods is presented. The results of this comparison show the achievement of a lower latency than these methods, with similar hardware requirements.