On the Power of Bitslice Implementation on Intel Core2 Processor

  • Authors:
  • Mitsuru Matsui;Junko Nakajima

  • Affiliations:
  • Information Technology R&D Center, Mitsubishi Electric Corporation, 5-1-1 Ofuna Kamakura Kanagawa, Japan;Information Technology R&D Center, Mitsubishi Electric Corporation, 5-1-1 Ofuna Kamakura Kanagawa, Japan

  • Venue:
  • CHES '07 Proceedings of the 9th international workshop on Cryptographic Hardware and Embedded Systems
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper discusses the state-of-the-art fast software implementation of block ciphers on Intel's new microprocessor Core2, particularly concentrating on "bitslice implementation". The bitslice parallel encryption technique, initially proposed by Biham for speeding-up DES, has been successful on RISC processors with many long registers, but on the other side bitsliced ciphers are not widely used in real applications on PC platforms, because in many cases they were actually not very fast on previous PC processors. Moreover the bitslice mode requires a non-standard data format and hence an additional format conversion is needed for compatibility with an existing parallel mode of operation, which was considered to be expensive.This paper demonstrates that some bitsliced ciphers have a remarkable performance gain on Intel's Core2 processor due to its enhanced SIMD architecture. We show that KASUMI, a UMTS/GSM mobile standard block cipher, can be four times faster when implemented using a bitslice technique on this processor. Also our bitsliced AES code runs at the speed of 9.2 cycles/byte, which is the performance record of AES ever made on a PC processor. Next we for the first time focus on how to optimize a conversion algorithm between a bitslice format and a standard format on a specific processor. As a result, the bitsliced AES code can be faster than a highly optimized "standard AES" code on Core2, even taking an overhead of the conversion into consideration. This means that in the CTR mode, bitsliced AES is not only fast but also fully compatible with an existing implementation and moreover secure against cache timing attacks, since a bitsliced cipher does not use any lookup tables with key/data-dependent address.