Scalable unified dual-radix architecture for montgomery multiplication in GF(P) and GF(2n)

  • Authors:
  • Kazuyuki Tanimura;Ryuta Nara;Shunitsu Kohara;Kazunori Shimizu;Youhua Shi;Nozomu Togawa;Masao Yanagisawa;Tatsuo Ohtsuki

  • Affiliations:
  • Waseda University, Shinjuku, Tokyo, Japan;Waseda University, Shinjuku, Tokyo, Japan;Waseda University, Shinjuku, Tokyo, Japan;Waseda University, Shinjuku, Tokyo, Japan;Waseda University, Shinjuku, Tokyo, Japan;Waseda University, Shinjuku, Tokyo, Japan;Waseda University, Shinjuku, Tokyo, Japan;Waseda University, Shinjuku, Tokyo, Japan

  • Venue:
  • Proceedings of the 2008 Asia and South Pacific Design Automation Conference
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modular multiplication is the most dominant arithmetic operation in elliptic curve cryptography (ECC), which is a type of public-key cryptography. Montgomery multiplication is commonly used as a technique for the modular multiplication and required scalability since the bit length of operands varies depending on the security levels. Also, ECC is performed in GF(P) or GF(2n), and unified architectures for GF(P) and GF(2n) multiplier are needed. However, in previous works, changing frequency or dual-radix architecture is necessary to deal with delay-time difference between GF(P) and GF(2n) circuits of the multiplier because the critical path of GF(P) circuit is longer. This paper proposes a scalable unified dual-radix architecture for Montgomery multiplication in GF(P) and GF(2n). The proposed architecture unifies 4 parallel radix-216 multipliers in GF(P) and a radix-264 multiplier in GF(2n) into a single unit. Applying lower radix to GF(P) multiplier shortens its critical path and makes it possible to compute the operands in the two fields using the same multiplier at the same frequency so that clock dividers to deal with the delay-time difference are not required. Moreover, parallel architecture in GF(P) reduces the clock cycles increased by dual-radix approach. Consequently, the proposed architecture achieves to compute GF(P) 256-bit Montgomery multiplication in 0.23μs.