Software implementation of modular exponentiation, using advanced vector instructions architectures

Authors:
Shay Gueron;Vlad Krasnov
Affiliations:
Department of Mathematics, University of Haifa, Israel, Intel Corporation, Israel Development Center, Haifa, Israel;Intel Corporation, Israel Development Center, Haifa, Israel
Venue:
WAIFI'12 Proceedings of the 4th international conference on Arithmetic of Finite Fields
Year:
2012

Citing 8
Cited 0

Analyzing and Comparing Montgomery Multiplication Algorithms

IEEE Micro
Precise Bounds for Montgomery Modular Multiplication and Some Potentially Insecure RSA Moduli

CT-RSA '02 Proceedings of the The Cryptographer's Track at the RSA Conference on Topics in Cryptology
Montgomery's Multiplication Technique: How to Make It Smaller and Faster

CHES '99 Proceedings of the First International Workshop on Cryptographic Hardware and Embedded Systems
Enhanced Montgomery Multiplication

CHES '02 Revised Papers from the 4th International Workshop on Cryptographic Hardware and Embedded Systems
Parallel Cryptographic Arithmetic Using a Redundant Montgomery Representation

IEEE Transactions on Computers
A Review of SIMD Multimedia Extensions and their Usage in Scientific and Engineering Applications

The Computer Journal
Modern Computer Arithmetic

Modern Computer Arithmetic
Speeding Up Big-Numbers Squaring

ITNG '12 Proceedings of the 2012 Ninth International Conference on Information Technology - New Generations

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper describes an algorithm for computing modular exponentiation using vector (SIMD) instructions. It demonstrates, for the first time, how such a software approach can outperform the classical scalar (ALU) implementations, on the high end x86_64 platforms, if they have a wide SIMD architecture. Here, we target speeding up RSA2048 on Intel's soon-to-arrive platforms that support the AVX2 instruction set. To this end, we applied our algorithm and generated an optimized AVX2-based software implementation of 1024-bit modular exponentiation. This implementation is seamlessly integrated into OpenSSL, by patching over OpenSSL 1.0.1. Our results show that our implementation requires 51% less instructions than the current OpenSSL 1.0.1 implementation. This illustrates the potential significant speedup in the RSA2048 performance, which is expected in the coming (2013) Intel processors. The impact of such speedup on servers is noticeable, especially since migration to RSA2048 is recommended by NIST, starting from 2013.