A method for obtaining digital signatures and public-key cryptosystems
Communications of the ACM
A Scalable Architecture for Montgomery Multiplication
CHES '99 Proceedings of the First International Workshop on Cryptographic Hardware and Embedded Systems
A Scalable Architecture for Modular Multiplication Based on Montgomery's Algorithm
IEEE Transactions on Computers
An Improved Unified Scalable Radix-2 Montgomery Multiplier
ARITH '05 Proceedings of the 17th IEEE Symposium on Computer Arithmetic
A Scalable Architecture for RSA Cryptography on Large FPGAs
FCCM '06 Proceedings of the 14th Annual IEEE Symposium on Field-Programmable Custom Computing Machines
Implementing the elliptic curve method of factoring in reconfigurable hardware
CHES'06 Proceedings of the 8th international conference on Cryptographic Hardware and Embedded Systems
Reduced redundant arithmetic applied on low power multiply-accumulate units
EHAC'12/ISPRA/NANOTECHNOLOGY'12 Proceedings of the 11th WSEAS international conference on Electronics, Hardware, Wireless and Optical Communications, and proceedings of the 11th WSEAS international conference on Signal Processing, Robotics and Automation, and proceedings of the 4th WSEAS international conference on Nanotechnology
VHDL code generator for optimized carry-save reduction strategy in low power computer arithmetic
CSCC'11 Proceedings of the 2nd international conference on Circuits, Systems, Communications & Computers
On carry-save strategies for multiply-accumulate arithmetic
CSCC'11 Proceedings of the 2nd international conference on Circuits, Systems, Communications & Computers
Attacking RSA---CRT signatures with faults on montgomery multiplication
CHES'12 Proceedings of the 14th international conference on Cryptographic Hardware and Embedded Systems
An efficient RSA implementation without precomputation
Inscrypt'11 Proceedings of the 7th international conference on Information Security and Cryptology
A compact FPGA-based montgomery multiplier over prime fields
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Hi-index | 0.00 |
Montgomery modular multiplication is one of the fundamental operations used in cryptographic algorithms, such as RSA and Elliptic Curve Cryptosystems. At CHES 1999, Tenca and Koç introduced a now-classical architecture for implementing Montgomery multiplication in hardware. With parameters optimized for minimum latency, this architecture performs a single Montgomery multiplication in approximately 2n clock cycles, where n is the size of operands in bits. In this paper we propose and discuss an optimized hardware architecture performing the same operation in approximately n clock cycles with almost the same clock period. Our architecture is based on pre-computing partial results using two possible assumptions regarding the most significant bit of the previous word, and is only marginally more demanding in terms of the circuit area. The new radix-2 architecture can be extended for the case of radix-4, while preserving a factor of two speed-up over the corresponding radix-4 design by Tenca, Todorov, and Koç from CHES 2001. Our architecture has been verified by modeling it in Verilog-HDL, implementing it using Xilinx Virtex-II 6000 FPGA, and experimentally testing it using SRC-6 reconfigurable computer.