Proceedings on Advances in cryptology---CRYPTO '86
A cryptographic library for the Motorola DSP56000
EUROCRYPT '90 Proceedings of the workshop on the theory and application of cryptographic techniques on Advances in cryptology
Hardware Implementation of Montgomery's Modular Multiplication Algorithm
IEEE Transactions on Computers
A method for obtaining digital signatures and public-key cryptosystems
Communications of the ACM
Systolic Modular Multiplication
IEEE Transactions on Computers
Montgomery's Multiplication Technique: How to Make It Smaller and Faster
CHES '99 Proceedings of the First International Workshop on Cryptographic Hardware and Embedded Systems
Fast Implementation of Public-Key Cryptography ona DSP TMS320C6201
CHES '99 Proceedings of the First International Workshop on Cryptographic Hardware and Embedded Systems
A Scalable Architecture for Montgomery Multiplication
CHES '99 Proceedings of the First International Workshop on Cryptographic Hardware and Embedded Systems
A Scalable GF(p) Elliptic Curve Processor Architecture for Programmable Hardware
CHES '01 Proceedings of the Third International Workshop on Cryptographic Hardware and Embedded Systems
High-Radix Design of a Scalable Modular Multiplier
CHES '01 Proceedings of the Third International Workshop on Cryptographic Hardware and Embedded Systems
Bipartite modular multiplication
CHES'05 Proceedings of the 7th international conference on Cryptographic hardware and embedded systems
Hi-index | 0.00 |
We show the fastest implementation result of RSA on Itanium 2. For realizing the fast implementation, we improved the implementation algorithm of Montgomery multiplication proposed by Itoh et al. By using our implementation algorithm, pilepine delay is decreased than previous one on Itanium 2. And we implemented this algorithm with highly optimized for parallel processing. Our code can execute 4 instructions per cycle (At maximum, 6 instructions are executed per cycle on Itanium 2), and its probability of pipeline stalling is just only 5%. Our RSA implementation using this code performs 32 times per second of 4096-bit RSA decryption with CRT on Itanium 2 at 900MHz. As a result, our implementation of RSA is the fastest on Itanium2. This is 3.1 times faster than IPP, a software library developed by Intel, in the best case.