Constructive real interpretation of numerical programs
SIGPLAN '87 Papers of the Symposium on Interpreters and interpretive techniques
Introduction to programmable active memories
Systolic array processors
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
A Survey of Hardware Implementation of RSA (Abstract)
CRYPTO '89 Proceedings of the 9th Annual International Cryptology Conference on Advances in Cryptology
A high-performance microarchitecture with hardware-programmable functional units
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Programmable active memories: reconfigurable systems come of age
Readings in hardware/software co-design
Systolic Modular Multiplication
IEEE Transactions on Computers
PAS '95 Proceedings of the First Aizu International Symposium on Parallel Algorithms/Architecture Synthesis
Portable Multiprecision Arithmetic Package Based on Message Passing Interface
PAS '97 Proceedings of the 2nd AIZU International Symposium on Parallel Algorithms / Architecture Synthesis
Constant Coefficient Multiplication Using Look-Up Tables
Journal of VLSI Signal Processing Systems
Hi-index | 0.00 |
We present various experiments in Hardware/Software design tradeoffs met in speeding up long integer multiplications. This work spans over a year, with more than 12 different hardware designs tested and measured.To implement these designs, we rely on our PAM (for Programmable Active Memory, see [BRV]) technology which provides us with a 50 millisecond turn-around time silicon foundry for implementing up to 50K gate logic designs fully equipped with fast local RAM and host bus interface.First, we demonstrate how a simple hardware 512 bits integer multiplier coupled with a low end workstation host yields performance on long arithmetic superior to that of the fastest computers for which we could obtain actual benchmark figures.Second, we specialize this hardware in order to speed-up one specific application of long integer arithmetic, namely Rivest-Shamir-Adleman public-key cryptography [RSA]. We demonstrate how a single host driving 3 differently configured PAM boards delivers RSA encryption and decryption faster than 200Kbits/sec for 512 bits keys. This beats the best currently working VLSI specially built for RSA by one order of magnitude.