MU6-G. a new design to achieve mainframe performance from a mini-sized computer
ISCA '80 Proceedings of the 7th annual symposium on Computer Architecture
The logic of computer arithmetic
The logic of computer arithmetic
Technology and Design Tradeoffs in the Creation of a Modern Supercomputer
IEEE Transactions on Computers
Design of High-Speed Digital Divider Units
IEEE Transactions on Computers
Automatic computation of exponentials, logarithms, ratios and square roots
IBM Journal of Research and Development
Radix-4 Square Rot Without Initial PLA
IEEE Transactions on Computers
Square Rooting Algorithms for Integer and Floating-Point Numbers
IEEE Transactions on Computers
IEEE Transactions on Computers
Higher Radix Square Root with Prescaling
IEEE Transactions on Computers - Special issue on computer arithmetic
Proceedings of the 1999 ACM symposium on Applied computing
Optimization of Mutual and Signature Testing Schemes for Highly Concurrent Systems
Journal of Electronic Testing: Theory and Applications
Very High Radix Square Root with Prescaling and Rounding and a Combined Division/Square Root Unit
IEEE Transactions on Computers
Design of a Radix 4 Division Unit with Simple Selection Table
IEEE Transactions on Computers
Over-Redundant Digit Sets and the Design of Digit-By-Digit Division Units
IEEE Transactions on Computers
A Fast Radix-4 Division Algorithm and its Architecture
IEEE Transactions on Computers
IEEE Transactions on Computers
Hi-index | 15.01 |
In this paper radix-4 algorithms for square root and division are developed. The division algorithm evaluates the more useful function xz/y. These algorithms are shown to be suitable for implementing as a unified hardware unit which evaluates square root, division, and multiplication. Cost reductions in the hardware are obtained by use of gate arrays. A design based on the Motorola MCA2500 series of Macrocell gate array (MCA) is presented. At a cost of 9 MCA's and 16 commercial ECL 100 K parts a 64-bit square root can be evaluated in 750 us using worst case delays. Division takes 710 ns and multiplication 325 ns. Redundancy in the digit set together with carry-save adders are used to achieve this high performance.