Evaluating Instruction Set Extensions for Fast Arithmetic on Binary Finite Fields

  • Authors:
  • A. Murat Fiskiran;Ruby B. Lee

  • Affiliations:
  • Princeton University;Princeton University

  • Venue:
  • ASAP '04 Proceedings of the Application-Specific Systems, Architectures and Processors, 15th IEEE International Conference
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Binary finite fields GF(2^n) are very commonly used in cryptography, particularly in public-key algorithms such as Elliptic Curve Cryptography (ECC). On word-oriented programmable processors, field elements are generally represented as polynomials with coefficients from {0, 1}. Key arithmetic operations on these polynomials, such as squaring and multiplication, are not supported by integer-oriented processor architectures. Instead, these are implemented in software, causing a very large fraction of the cryptography execution time to be dominated by a few elementary operations. For example, more than 90% of the execution time of 163-bitECC may be consumed by two simple field operations: squaring and multiplication. A few processor architectures have been proposed recently that include instructions for binary field arithmetic. However, these have only considered processors with small wordsizes and in-order, single-issue execution. The first contribution of this paper is to validate thesenew arithmetic instructions for processors with wider wordsizes and multiple-issue (e.g. superscalar) execution. We also consider the effects of varying the number of functional units and load/store pipes. We demonstrate that the combination of microarchitecture and new instructions provides speedups up to 22.4脳 for ECC point multiplication. Second, we showthat if a bit-level reverse instruction is included in the instruction set, the size of the multiplier can be reduced by half without significant performance degradation. Third, we compare the benefits of superscalar execution with wordsize scaling. The latter has been used in recent processor architectures such as PLX and PAX as a new way to extract parallelism. We show that 2脳 wordsize scaling provides 70% better performance than 2-way superscalar execution. Finally, we suggest a low-cost method, which we call multi-word result execution, to realize some of the benefits of wordsize scaling in existing processors with fixed wordsizes.