A method for obtaining digital signatures and public-key cryptosystems
Communications of the ACM
Parallel Cryptographic Arithmetic Using a Redundant Montgomery Representation
IEEE Transactions on Computers
IEEE Transactions on Computers
Enhanced montgomery multiplication on DSP architectures for embedded public-key cryptosystems
EURASIP Journal on Embedded Systems - Embedded System Design in Intelligent Industrial Automation
GPU-Accelerated Montgomery Exponentiation
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
Exploiting the Power of GPUs for Asymmetric Cryptography
CHES '08 Proceeding sof the 10th international workshop on Cryptographic Hardware and Embedded Systems
Toward acceleration of RSA using 3D graphics hardware
Cryptography and Coding'07 Proceedings of the 11th IMA international conference on Cryptography and coding
Hardware/software co-design of elliptic curve cryptography on an 8051 microcontroller
CHES'06 Proceedings of the 8th international conference on Cryptographic Hardware and Embedded Systems
Hi-index | 0.00 |
Parallel programming techniques have become one of the great challenges in the transition from single-core to multi-core architectures. In this paper, we investigate the parallelization of the Montgomery multiplication, a very common and time-consuming primitive in public-key cryptography. A scalable parallel programming scheme, called pSHS, is presented to map the Montgomery multiplication to a general multicore architecture. The pSHS scheme offers a considerable speedup. Based on 2-, 4-, and 8-core systems, the speedup of a parallelized 2048-bit Montgomery multiplication is 1.98, 3.74, and 6.53, respectively. pSHS delivers stable performance, high portability, high throughput and low latency over different multicore systems. These make pSHS a good candidate for public-key software implementations, including RSA, DSA, and ECC, based on general multicore platforms. We present a detailed analysis of pSHS, and verify it on dual-core, quad-core and eight-core prototypes.