The Design of Rijndael
A Scalable and Unified Multiplier Architecture for Finite Fields GF(p) and GF(2m)
CHES '00 Proceedings of the Second International Workshop on Cryptographic Hardware and Embedded Systems
Efficient Software Implementation of AES on 32-Bit Platforms
CHES '02 Revised Papers from the 4th International Workshop on Cryptographic Hardware and Embedded Systems
Guide to Elliptic Curve Cryptography
Guide to Elliptic Curve Cryptography
Speeding Up AES By Extending a 32 bit Processor Instruction Set
ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
An instruction set extension for fast and memory-efficient AES implementation
CMS'05 Proceedings of the 9th IFIP TC-6 TC-11 international conference on Communications and Multimedia Security
Accelerating AES using instruction set extensions for elliptic curve cryptography
ICCSA'05 Proceedings of the 2005 international conference on Computational Science and Its Applications - Volume Part II
Instruction set extensions for efficient AES implementation on 32-bit processors
CHES'06 Proceedings of the 8th international conference on Cryptographic Hardware and Embedded Systems
A hardware processor supporting elliptic curve cryptography for less than 9 kGEs
CARDIS'11 Proceedings of the 10th IFIP WG 8.8/11.2 international conference on Smart Card Research and Advanced Applications
FPGA based unified architecture for public key and private key cryptosystems
Frontiers of Computer Science: Selected Publications from Chinese Universities
Secure outsourced computation of iris matching
Journal of Computer Security
Hi-index | 0.00 |
Embedded systems require efficient yet flexible implementations of cryptographic primitives with a minimal impact on the overall cost of a device. In this paper we present the design of a functional unit (FU) for accelerating the execution of cryptographic software on 32-bit processors. The FU is basically a multiply-accumulate (MAC) unit able to perform multiplications and MAC operations on integers and binary polynomials. Polynomial arithmetic is a performance-critical building block of numerous cryptosystems using binary extension fields, including public-key primitives based on elliptic curves (e.g. ECDSA), symmetric ciphers (e.g. AES or Twofish), and hash functions (e.g. Whirlpool). We integrated the FU into the Leon2 SPARC V8 core and prototyped the extended processor in an FPGA. All operations provided by the FU are accessible to the programmer through custom instructions. Our results show that the FU allows to accelerate the execution of 128-bit AES by up to 78% compared to a conventional software implementation using only native SPARC V8 instructions. Moreover, the custom instructions reduce the code size by up to 87.4%. The FU increases the silicon area of the Leon2 core by just 8,352 gates and has almost no impact on its cycle time.