Precise Bounds for Montgomery Modular Multiplication and Some Potentially Insecure RSA Moduli
CT-RSA '02 Proceedings of the The Cryptographer's Track at the RSA Conference on Topics in Cryptology
Montgomery's Multiplication Technique: How to Make It Smaller and Faster
CHES '99 Proceedings of the First International Workshop on Cryptographic Hardware and Embedded Systems
Enhanced Montgomery Multiplication
CHES '02 Revised Papers from the 4th International Workshop on Cryptographic Hardware and Embedded Systems
Parallel Cryptographic Arithmetic Using a Redundant Montgomery Representation
IEEE Transactions on Computers
Modern Computer Arithmetic
Speeding Up Big-Numbers Squaring
ITNG '12 Proceedings of the 2012 Ninth International Conference on Information Technology - New Generations
Hi-index | 0.00 |
This paper describes an algorithm for computing modular exponentiation using vector (SIMD) instructions. It demonstrates, for the first time, how such a software approach can outperform the classical scalar (ALU) implementations, on the high end x86_64 platforms, if they have a wide SIMD architecture. Here, we target speeding up RSA2048 on Intel's soon-to-arrive platforms that support the AVX2 instruction set. To this end, we applied our algorithm and generated an optimized AVX2-based software implementation of 1024-bit modular exponentiation. This implementation is seamlessly integrated into OpenSSL, by patching over OpenSSL 1.0.1. Our results show that our implementation requires 51% less instructions than the current OpenSSL 1.0.1 implementation. This illustrates the potential significant speedup in the RSA2048 performance, which is expected in the coming (2013) Intel processors. The impact of such speedup on servers is noticeable, especially since migration to RSA2048 is recommended by NIST, starting from 2013.