Division by invariant integers using multiplication
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Modulo Reduction in Residue Number Systems
IEEE Transactions on Parallel and Distributed Systems
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
The art of computer programming, volume 2 (3rd ed.): seminumerical algorithms
Handbook of Applied Cryptography
Handbook of Applied Cryptography
GPU-Accelerated Montgomery Exponentiation
ICCS '07 Proceedings of the 7th international conference on Computational Science, Part I: ICCS 2007
AES Encryption Implementation and Analysis on Commodity Graphics Processing Units
CHES '07 Proceedings of the 9th international workshop on Cryptographic Hardware and Embedded Systems
Exploiting the Power of GPUs for Asymmetric Cryptography
CHES '08 Proceeding sof the 10th international workshop on Cryptographic Hardware and Embedded Systems
Practical symmetric key cryptography on modern graphics hardware
SS'08 Proceedings of the 17th conference on Security symposium
Cox-Rower architecture for fast parallel montgomery multiplication
EUROCRYPT'00 Proceedings of the 19th international conference on Theory and application of cryptographic techniques
Symmetric key cryptography on modern graphics hardware
ASIACRYPT'07 Proceedings of the Advances in Crypotology 13th international conference on Theory and application of cryptology and information security
Toward acceleration of RSA using 3D graphics hardware
Cryptography and Coding'07 Proceedings of the 11th IMA international conference on Cryptography and coding
Cryptographics: secret key cryptography using graphics cards
CT-RSA'05 Proceedings of the 2005 international conference on Topics in Cryptology
Proceedings of the ACM SIGCOMM 2010 conference
ACM SIGCOMM Computer Communication Review
SSLShader: cheap SSL acceleration with commodity processors
Proceedings of the 8th USENIX conference on Networked systems design and implementation
GPU accelerated cryptography as an OS service
Transactions on computational science XI
Modular resultant algorithm for graphics processors
ICA3PP'10 Proceedings of the 10th international conference on Algorithms and Architectures for Parallel Processing - Volume Part I
Acceleration of composite order bilinear pairing on graphics hardware
ICICS'12 Proceedings of the 14th international conference on Information and Communications Security
Hi-index | 0.00 |
Graphics processing units (GPU) are increasingly being used for general purpose computing. We present implementations of large integer modular exponentiation, the core of public-key cryptosystems such as RSA, on a DirectX 10 compliant GPU. DirectX 10 compliant graphics processors are the latest generation of GPU architecture, which provide increased programming flexibility and support for integer operations. We present high performance modular exponentiation implementations based on integers represented in both standard radix form and residue number system form. We show how a GPU implementation of a 1024-bit RSA decrypt primitive can outperform a comparable CPU implementation by up to 4 times and also improve the performance of previous GPU implementations by decreasing latency by up to 7 times and doubling throughput. We present how an adaptive approach to modular exponentiation involving implementations based on both a radix and a residue number system gives the best all-around performance on the GPU both in terms of latency and throughput. We also highlight the usage criteria necessary to allow the GPU to reach peak performance on public key cryptographic operations.