Effects of Instruction-Set Extensions on an Embedded Processor: A Case Study on Elliptic Curve Cryptography over GF(2/sup m/)

  • Authors:
  • Sandro Bartolini;Irina Branovic;Roberto Giorgi;Enrico Martinelli

  • Affiliations:
  • -;-;-;-

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 2008

Quantified Score

Hi-index 14.98

Visualization

Abstract

Elliptic curve (EC) cryptography is expected to have a significant role in enabling information security in constrained embedded devices. In order to be efficient on a target architecture, EC cryptosystems (ECC) require an accurate choice/tuning of the algorithms that perform the underlying mathematical operations. This paper performs a cycle-level analysis of the dependencies of ECC performance from the interaction between the features of the mathematical algorithms and the actual architectural and microarchitectural features of an ARM XScale processor. The paper investigates the origin of performance through the breakdown of execution into the cycles spent in the different activities at field- and at elliptic-curve levels. In addition, we perform a cycle-level analysis of a modified ARM processor that includes in its datapath a word-level finite field polynomial multiplier (poly_mul). The paper points out the most advantageous mix of EC parameters both for the standard ARM XScale platform and for the one equipped with the poly_mul unit. In this way, the latter allows more than 41% execution time reduction on the considered benchmarks. Then, the paper analyses the correlation between EC benchmark performance and the possible architectural organizations of a processor equipped with poly_mul unit(s). For instance, only superscalar pipelines can exploit the features of out-of-order execution and only very complex organizations (e.g. 4-way superscalar) can exploit a high number of available ALUs. Conversely, we show that there are no benefits in endowing the processor with more than one poly_mul and we point out a possible trade-off between performance and complexity increase: 2-way in-order/out-of-order pipeline allows +50% and +90% IPC, respectively. Finally, we show that there are not critical constraints on the latency and pipelining capability of the poly_mul unit.