CMOS floating-point unit for the S/390 parallel enterprise server G4
IBM Journal of Research and Development - Special issue: IBM S/390 G3 and G4
Division and Square Root: Digit-Recurrence Algorithms and Implementations
Division and Square Root: Digit-Recurrence Algorithms and Implementations
A Radix-8 CMOS S/390 Multiplier
ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
SRT Division Architectures and Implementations
ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
The S/390 G5 Floating Point Unit Supporting Hex and Binary Architectures
ARITH '99 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
Floating-Point Unit in Standard Cell Design with 116 Bit Wide Dataflow
ARITH '99 Proceedings of the 14th IEEE Symposium on Computer Arithmetic
High Performance Floating-Point Unit with 116 Bit Wide Divider
ARITH '03 Proceedings of the 16th IEEE Symposium on Computer Arithmetic (ARITH-16'03)
Hardware Implementations of Denormalized Numbers
ARITH '03 Proceedings of the 16th IEEE Symposium on Computer Arithmetic (ARITH-16'03)
The IBM eServer z990 microprocessor
IBM Journal of Research and Development
The S/390 G5 floating-point unit
IBM Journal of Research and Development
The microarchitecture of the IBM eServer z900 processor
IBM Journal of Research and Development
The IBM eServer z990 microprocessor
IBM Journal of Research and Development
Contributions to the GNU compiler collection
IBM Systems Journal
Decimal floating-point in z9: an implementation and testing perspective
IBM Journal of Research and Development
Hi-index | 0.00 |
The floating-point unit (FPU) of the IBM z990 eServerTM is the first one in an IBM mainframe with a fused multiply-add dataflow. It also represents the first time that an SRT divide algorithm (named after Sweeney, Robertson, and Tocher, who independently proposed the algorithm) was used in an IBM mainframe. The FPU supports dual architectures: the zSeries® hexadecimal floating-point architecture and the IEEE 754 binary floating-point architecture. Six floating-point formats-- including short, long, and extended operands-are supported in hardware. The throughput of this FPU is one multiply-add operation per cycle. The instructions are executed in five pipeline steps, and there are multiple provisions to avoid stalls in case of data dependencies. It is able to handle denormalized input operands and denormalized results without a stall (except for architectural program exceptions). It has a new extended-precision divide and square-root dataflow. This dataflow uses a radix-4 SRT algorithm (radix-2 for square root) and is able to handle divides and square-root operations in multiple floating-point and fixed-point formats. For fixed-point divisions, a new mechanism improves the performance by using an algorithm with which the number of divide iterations depends on the effective number of quotient bits.