CMOS floating-point unit for the S/390 parallel enterprise server G4
IBM Journal of Research and Development - Special issue: IBM S/390 G3 and G4
Speed, power, area, and latency tradeoffs in adaptive FIR filtering for PRML read channels
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Advanced Computer Arithmetic Design
Advanced Computer Arithmetic Design
A Radix-8 CMOS S/390 Multiplier
ARITH '97 Proceedings of the 13th Symposium on Computer Arithmetic (ARITH '97)
Pipelined Multiplicative Division with IEEE Rounding
ICCD '03 Proceedings of the 21st International Conference on Computer Design
A Parallel Algorithm for the Efficient Solution of a General Class of Recurrence Equations
IEEE Transactions on Computers
A goldschmidt division method with faster than quadratic convergence
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
Several commercial processors have selected the radix-8 multiplier architecture to increase their speed, thereby reducing the number of partial products. Radix-8 encoding reduces the digit number length in a signed digit representation. Its performance bottleneck is the generation of the term 3X, also referred to as hard multiple. This term is usually computed by an adding and shifting operation, 3X=2X+X, in a high-speed adder. In a 2X+X addition, close full adders share the same input signal. This property permits simplified algebraic expressions associated to a 3X operation other than in a conventional addition. This paper shows that the 3X operation can be expressed in terms of two signals, H"i and K"i, functionally equivalent to two carries. H"i and K"i are computed in parallel using architectures which lead to an area- and speed-efficient implementation. For the purposes of comparison, implementation based on standard cells of conventional adders has been compared with the proposed circuits based on these H"i and K"i signals. As a result, the delay of the proposed serial scheme is reduced by roughly 67% without additional cost in area, the delay and area of the carry look-ahead scheme is reduced by 20% and 17%, and that of the parallel prefix scheme is reduced by 26% and 46%, respectively.