Computer number systems and arithmetic
Computer number systems and arithmetic
Software implementation of floating-point arithmetic on a reduced-instruction-set
Journal of Parallel and Distributed Computing
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Design of the IBM RISC System/6000 floating-point execution unit
IBM Journal of Research and Development
Computation of elementary functions on the IBM RISC System/6000 processor
IBM Journal of Research and Development
Fast Division Using Accurate Quotient Approximations to Reduce the Number of Iterations
IEEE Transactions on Computers - Special issue on computer arithmetic
The PowerPC 603 microprocessor
Communications of the ACM
IEEE Spectrum
Introduction to Arithmetic for Digital Systems Designers
Introduction to Arithmetic for Digital Systems Designers
Division and Square Root: Digit-Recurrence Algorithms and Implementations
Division and Square Root: Digit-Recurrence Algorithms and Implementations
IEEE Micro
Architecture of the Pentium Microprocessor
IEEE Micro
The Power PC 601 Microprocessor
IEEE Micro
The PowerPC 604 RISC microprocessor
IEEE Micro
Accurate Rounding Scheme for the Newton-Raphson Method Using Redundant Binary Representation
IEEE Transactions on Computers
Faithful Bipartite ROM Reciprocal Tables
ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
30-ns 55-b Radix 2 Division and Square Root Using a Self-Timed Circuit
ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
Very-high radix combined division and square root with prescaling and selection by rounding
ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
It Takes Six Ones To Reach a Flaw
ARITH '95 Proceedings of the 12th Symposium on Computer Arithmetic
UltraSPARC: the next generation superscalar 64-bit SPARC
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Advanced performance features of the 64-bit PA-8000
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Internal architecture of Alpha 21164 microprocessor
COMPCON '95 Proceedings of the 40th IEEE Computer Society International Conference
Design Issues in Floating-Point Division
Design Issues in Floating-Point Division
An Analysis of Division Algorithms and Implementations
An Analysis of Division Algorithms and Implementations
High-Speed Double-Precision Computation of Reciprocal, Division, Square Root and Inverse Square Root
IEEE Transactions on Computers
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
A parametric error analysis of Goldschmidt's division algorithm
Journal of Computer and System Sciences
Reciprocal and Reciprocal Square Root Units with Operand Modification and Multiplication
Journal of VLSI Signal Processing Systems
Fast decimal floating-point division
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A Decimal Floating-Point Divider Using Newton---Raphson Iteration
Journal of VLSI Signal Processing Systems
Floating-point divider design for FPGAs
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Custom floating-point unit generation for embedded systems
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Design issues and implementations for floating-point divide-add fused
IEEE Transactions on Circuits and Systems II: Express Briefs
A novel implementation of radix-4 floating-point division/square-root using comparison multiples
Computers and Electrical Engineering
Journal of Signal Processing Systems
Hi-index | 0.00 |
Floating-point divide and square-root operations are essential to many scientific and engineering applications, and are required in all computer systems that support the IEEE floating-point standard. Yet many current microprocessors provide only weak support for these operations. The latency and throughput of division are typically far inferior to those of floating-point addition and multiplication, and square-root performance is often even lower. This article argues the case for high-performance division and square root. It also explains the algorithms and implementations of the primary techniques, subtractive and multiplicative methods, employed in microprocessor floating-point units with their associated area/performance tradeoffs. Case studies of representative floating-point unit configurations are presented, supported by simulation results using a carefully selected benchmark, Givens rotation, to show the dynamic performance impact of the various implementation alternatives. The topology of the implementation is found to be an important performance factor. Multiplicative algorithms, such as the Newton-Raphson method and Goldschmidt's algorithm, can achieve low latencies. However, these implementations serialize multiply, divide, and square root operations through a single pipeline, which can lead to low throughput. While this hardware sharing yields low size requirements for baseline implementations, lower-latency versions require many times more area. For these reasons, multiplicative implementations are best suited to cases where subtractive methods are precluded by area constraints, and modest performance on divide and square root operations is tolerable. Subtractive algorithms, exemplified by radix-4 SRT and radix-16 SRT, can be made to execute in parallel with other floating-point operations.