High-Speed VLSI Multiplication Algorithm with a Redundant Binary Addition Tree
IEEE Transactions on Computers
The Clipper processor: instruction set architecture and implementation
Communications of the ACM
Area and performance tradeoffs in floating-point divide and square-root implementations
ACM Computing Surveys (CSUR)
Design Issues in Division and Other Floating-Point Operations
IEEE Transactions on Computers
CMOS floating-point unit for the S/390 parallel enterprise server G4
IBM Journal of Research and Development - Special issue: IBM S/390 G3 and G4
The S/390 G5 floating-point unit
IBM Journal of Research and Development
Hi-index | 0.01 |
A recent Sparc (scalable processor architecture) processor consists of a two-chip configuration, containing the TMS390C601 integer unit (IU) and the TMS390C602A floating-point unit (FPU). The second device, an innovative coprocessor that lets the processor execute single- or double-precision floating-point instructions concurrently with IU operations is described. Dedicated floating-point hardware in the FPU increases the performance of the system. Running at clock periods as small as 20 ns, the chip should deliver 5.5 million double-precision floating-point operations per second under the Linpack benchmark (50-MHz clock rate). The FPU provides single- and double-precision arithmetic functions: addition, subtraction, multiplication, division, square root, compare, and convert. To minimize its math unit's latency, the FPU uses a highly parallel architecture requiring separate math units to optimize additions and multiplications. Traps stop the execution of a program to jump to software routine for handling data-dependent errors or to execute instructions not implemented in the hardware. Benchmark results are presented.