Analyzing computer architectures
Analyzing computer architectures
Fast Division Using Accurate Quotient Approximations to Reduce the Number of Iterations
IEEE Transactions on Computers - Special issue on computer arithmetic
Supercomputer performance evaluation and the Perfect Benchmarks
ICS '90 Proceedings of the 4th international conference on Supercomputing
Introduction to Arithmetic for Digital Systems Designers
Introduction to Arithmetic for Digital Systems Designers
Division and Square Root: Digit-Recurrence Algorithms and Implementations
Division and Square Root: Digit-Recurrence Algorithms and Implementations
Choices of Operand Truncation in the SRT Division Algorithm
IEEE Transactions on Computers
The Design and Implementation of a High-Performance Floating-Point Divider
The Design and Implementation of a High-Performance Floating-Point Divider
Architecture Evaluator''s Work Bench and its Application to Microprocessor Floating Point Units
Architecture Evaluator''s Work Bench and its Application to Microprocessor Floating Point Units
Measuring the Complexity of SRT Tables
Measuring the Complexity of SRT Tables
Division Algorithms and Implementations
IEEE Transactions on Computers
A Mechanically Checked Proof of the AMD5K86TM Floating-Point Division Program
IEEE Transactions on Computers
Automatic Synthesis of Large Telescopic Units Based on Near-Minimum Timed Supersetting
IEEE Transactions on Computers
Very High Radix Square Root with Prescaling and Rounding and a Combined Division/Square Root Unit
IEEE Transactions on Computers
Computer arithmetic and hardware: "off the shelf" microprocessors versus "custom hardware"
Theoretical Computer Science
A Radix-4 New Svobota-Tung Divider with Constant Timing Complexity for Prescaling
Journal of VLSI Signal Processing Systems
Modular Verification of SRT Division
Formal Methods in System Design
High-Speed Double-Precision Computation of Reciprocal, Division, Square Root and Inverse Square Root
IEEE Transactions on Computers
Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
A Cost-Effective Pipelined Divider with a Small Lookup Table
IEEE Transactions on Computers
Fast decimal floating-point division
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Interactive presentation: Radix 4 SRT division with quotient prediction and operand scaling
Proceedings of the conference on Design, automation and test in Europe
An efficient mechanism for performance optimization of variable-latency designs
Proceedings of the 44th annual Design Automation Conference
A pipelined divider with a small lookup table
IMCAS'07 Proceedings of the 6th WSEAS International Conference on Instrumentation, Measurement, Circuits and Systems
Higher radix and redundancy factor for floating point SRT division
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
An improved division algorithm with a small lookup table and its implementation
ASID'09 Proceedings of the 3rd international conference on Anti-Counterfeiting, security, and identification in communication
Reconfigurable custom floating-point instructions (abstract only)
Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays
Design issues and implementations for floating-point divide-add fused
IEEE Transactions on Circuits and Systems II: Express Briefs
Increasing throughput of a RISC architecture using arithmetic data value speculation
Asilomar'09 Proceedings of the 43rd Asilomar conference on Signals, systems and computers
Simplifying the rounding for Newton-Raphson algorithm with parallel remainder
Asilomar'09 Proceedings of the 43rd Asilomar conference on Signals, systems and computers
Variable-latency design by function speculation
Proceedings of the Conference on Design, Automation and Test in Europe
A goldschmidt division method with faster than quadratic convergence
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Journal of Signal Processing Systems
Hi-index | 15.00 |
Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, in the worst case, a high latency hardware floating-point divider can contribute an additional 0.50 CPI to a system executing SPECfp92 applications. This paper presents the system performance impact of floating-point division latency for varying instruction issue rates. It also examines the performance implications of shared multiplication hardware, shared square root, on-the-fly rounding and conversion, and fused functional units. Using a system level study as a basis, it is shown how typical floating-point applications can guide the designer in making implementation decisions and trade-offs.