Evaluating Elementary Functions in a Numerical Coprocessor Based on Rational Approximations
IEEE Transactions on Computers
Area and performance tradeoffs in floating-point divide and square-root implementations
ACM Computing Surveys (CSUR)
Design Issues in Division and Other Floating-Point Operations
IEEE Transactions on Computers
Elementary functions: algorithms and implementation
Elementary functions: algorithms and implementation
Maple V: programming guide
Powering by a Table Look-Up and a Multiplication with Operand Modification
IEEE Transactions on Computers
Very High Radix Square Root with Prescaling and Rounding and a Combined Division/Square Root Unit
IEEE Transactions on Computers
IEEE Transactions on Computers - Special issue on computer arithmetic
Improving Goldschmidt Division, Square Root, and Square Root Reciprocal
IEEE Transactions on Computers - Special issue on computer arithmetic
Division and Square Root: Digit-Recurrence Algorithms and Implementations
Division and Square Root: Digit-Recurrence Algorithms and Implementations
Fast Hardware-Based Algorithms for Elementary Function Computations Using Rectangular Multipliers
IEEE Transactions on Computers
Efficient Initial Approximation and Fast Converging Methods for Division and Square Root
ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
Faithful Bipartite ROM Reciprocal Tables
ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
Cascaded Implementation of an Iterative Inverse--Square--Root Algorithm, with Overflow Lookahead
ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
Redundant Binary Booth Recoding
ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
SRT Division Architectures and Implementations
ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
High-Performance Hardware for Function Generation
ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Symmetric Bipartite Tables for Accurate Function Approximation
ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Floating Point Division and Square Root Algorithms and Implementation in the AMD-K7 Microprocessor
ARITH '99 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
Faithful Powering Computation Using Table Look-Up and a Fused Accumulation Tree
ARITH '01 Proceedings of the 15th IEEE Symposium on Computer Arithmetic
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
A Cost-Effective Pipelined Divider with a Small Lookup Table
IEEE Transactions on Computers
Algorithm and Architecture for Logarithm, Exponential, and Powering Computation
IEEE Transactions on Computers
High-Speed Function Approximation Using a Minimax Quadratic Interpolator
IEEE Transactions on Computers
High-Radix Logarithm with Selection by Rounding: Algorithm and Implementation
Journal of VLSI Signal Processing Systems
Real-Time Systems
Partial product reduction by using look-up tables for M×N multiplier
Integration, the VLSI Journal
Optimizing correctly-rounded reciprocal square roots for embedded VLIW cores
Asilomar'09 Proceedings of the 43rd Asilomar conference on Signals, systems and computers
A goldschmidt division method with faster than quadratic convergence
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 14.99 |
A new method for the high-speed computation of double-precision floating-point reciprocal, division, square root, and inverse square root operations is presented in this paper. This method employs a second-degree minimax polynomial approximation to obtain an accurate initial estimate of the reciprocal and the inverse square root values, and then performs a modified Goldschmidt iteration. The high accuracy of the initial approximation allows us to obtain double-precision results by computing a single Goldschmidt iteration, significantly reducing the latency of the algorithm. Two unfolded architectures are proposed: the first one computing only reciprocal and division operations, and the second one also including the computation of square root and inverse square root. The execution times and area costs for both architectures are estimated, and a comparison with other multiplicative-based methods is presented. The results of this comparison show the achievement of a lower latency than these methods, with similar hardware requirements.