An Analysis of Division Algorithms and Implementations

Authors:
Stuart F. Oberman;Michael J. Flynn
Affiliations:
-;-
Venue:
An Analysis of Division Algorithms and Implementations
Year:
1995

Citing 0
Cited 4

Area and performance tradeoffs in floating-point divide and square-root implementations

ACM Computing Surveys (CSUR)
Libraries of schedule-free operators in Alpha

ASAP '97 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures and Processors
The VEGA Moderately Parallel MIMD, Moderately Parallel SIMD, Architecture for High Performance Array Signal Processing

IPPS '98 Proceedings of the 12th. International Parallel Processing Symposium on International Parallel Processing Symposium
A Decimal Floating-Point Divider Using Newton---Raphson Iteration

Journal of VLSI Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, the increasing emphasis on high performance graphics and the industry-wide usage of performance benchmarks forces processor designers to pay close attention to all aspects of floating-point computation. Many algorithms are suitable for implementing division in hardware. This paper presents four major classes of algorithms in a unified framework, namely digit recurrence, functional iteration, very high radix, and variable latency. Digit recurrence algorithms, the most common of which is SRT, use subtraction as the fundamental operator, and they converge to a quotient linearly. Division by functional iteration converges to a quotient quadratically using multiplication. Very high radix division algorithms are similar to digit recurrence algorithms, but they incorporate multiplication to reduce the latency. Variable latency division algorithms reduce the average latency to form the quotient. These algorithms are explained and compared in this work. It is found that for low-cost implementations where chip area must be minimized, digit recurrence algorithms are suitable. An implementation of division by functional iteration can provide the lowest latency for typical multiplier latencies. Variable latency algorithms show promise for simultaneously minimizing average latency while also minimizing area.