A hardware-accelerated ECDLP with high-performance modular multiplication

  • Authors:
  • Lyndon Judge;Suvarna Mane;Patrick Schaumont

  • Affiliations:
  • Bradley Department of Electrical and Computer Engineering, Center for Embedded Systems for Critical Applications, Virginia Tech, Blacksburg, VA;Bradley Department of Electrical and Computer Engineering, Center for Embedded Systems for Critical Applications, Virginia Tech, Blacksburg, VA;Bradley Department of Electrical and Computer Engineering, Center for Embedded Systems for Critical Applications, Virginia Tech, Blacksburg, VA

  • Venue:
  • International Journal of Reconfigurable Computing - Special issue on Selected Papers from the 2011 International Conference on Reconfigurable Computing and FPGAs (ReConFig 2011)
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

Elliptic curve cryptography (ECC) has become a popular public key cryptography standard. The security of ECC is due to the difficulty of solving the elliptic curve discrete logarithm problem (ECDLP). In this paper, we demonstrate a successful attack on ECC over prime field using the Pollard rho algorithm implemented on a hardware-software cointegrated platform. We propose a high-performance architecture for multiplication over prime field using specialized DSP blocks in the FPGA. We characterize this architecture by exploring the design space to determine the optimal integer basis for polynomial representation and we demonstrate an efficient mapping of this design to multiple standard prime field elliptic curves. We use the resulting modular multiplier to demonstrate low-latency multiplications for curves secp112r1 and P-192. We apply our modular multiplier to implement a complete attack on secp112r1 using a Nallatech FSB-Compute platform with Virtex-5 FPGA. The measured performance of the resulting design is 114 cycles per Pollard rho step at 100 MHz, which gives 878 K iterations per second per ECC core. We extend this design to a multicore ECDLP implementation that achieves 14.05M iterations per second with 16 parallel point addition cores.