Improved fast software implementation of block ciphers
ICICS '97 Proceedings of the First International Conference on Information and Communication Security
A Compact Rijndael Hardware Architecture with S-Box Optimization
ASIACRYPT '01 Proceedings of the 7th International Conference on the Theory and Application of Cryptology and Information Security: Advances in Cryptology
New Block Encryption Algorithm MISTY
FSE '97 Proceedings of the 4th International Workshop on Fast Software Encryption
Efficient Rijndael Encryption Implementation with Composite Field Arithmetic
CHES '01 Proceedings of the Third International Workshop on Cryptographic Hardware and Embedded Systems
A Fast New DES Implementation in Software
FSE '97 Proceedings of the 4th International Workshop on Fast Software Encryption
Efficient galois field arithmetic on SIMD architectures
Proceedings of the fifteenth annual ACM symposium on Parallel algorithms and architectures
How far can we go on the x64 processors?
FSE'06 Proceedings of the 13th international conference on Fast Software Encryption
A systematic evaluation of compact hardware implementations for the rijndael s-box
CT-RSA'05 Proceedings of the 2005 international conference on Topics in Cryptology
CHES'05 Proceedings of the 7th international conference on Cryptographic hardware and embedded systems
The Salsa20 Family of Stream Ciphers
New Stream Cipher Designs
Light-Weight Instruction Set Extensions for Bit-Sliced Cryptography
CHES '08 Proceeding sof the 10th international workshop on Cryptographic Hardware and Embedded Systems
New AES Software Speed Records
INDOCRYPT '08 Proceedings of the 9th International Conference on Cryptology in India: Progress in Cryptology
Faster and Timing-Attack Resistant AES-GCM
CHES '09 Proceedings of the 11th International Workshop on Cryptographic Hardware and Embedded Systems
WESS '10 Proceedings of the 5th Workshop on Embedded Systems Security
Implementing virtual secure circuit using a custom-instruction approach
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Consecutive S-box lookups: a timing attack on SNOW 3G
ICICS'10 Proceedings of the 12th international conference on Information and communications security
A very compact hardware implementation of the KASUMI block cipher
WISTP'10 Proceedings of the 4th IFIP WG 11.2 international conference on Information Security Theory and Practices: security and Privacy of Pervasive Systems and Smart Devices
A model for structure attacks, with applications to PRESENT and serpent
FSE'12 Proceedings of the 19th international conference on Fast Software Encryption
Lightweight cryptography for the cloud: exploit the power of bitslice implementation
CHES'12 Proceedings of the 14th international conference on Cryptographic Hardware and Embedded Systems
Proceedings of the 2013 International Conference on Principles and Practices of Programming on the Java Platform: Virtual Machines, Languages, and Tools
Secure computing with the MPEG RVC framework
Image Communication
Hi-index | 0.00 |
This paper discusses the state-of-the-art fast software implementation of block ciphers on Intel's new microprocessor Core2, particularly concentrating on "bitslice implementation". The bitslice parallel encryption technique, initially proposed by Biham for speeding-up DES, has been successful on RISC processors with many long registers, but on the other side bitsliced ciphers are not widely used in real applications on PC platforms, because in many cases they were actually not very fast on previous PC processors. Moreover the bitslice mode requires a non-standard data format and hence an additional format conversion is needed for compatibility with an existing parallel mode of operation, which was considered to be expensive.This paper demonstrates that some bitsliced ciphers have a remarkable performance gain on Intel's Core2 processor due to its enhanced SIMD architecture. We show that KASUMI, a UMTS/GSM mobile standard block cipher, can be four times faster when implemented using a bitslice technique on this processor. Also our bitsliced AES code runs at the speed of 9.2 cycles/byte, which is the performance record of AES ever made on a PC processor. Next we for the first time focus on how to optimize a conversion algorithm between a bitslice format and a standard format on a specific processor. As a result, the bitsliced AES code can be faster than a highly optimized "standard AES" code on Core2, even taking an overhead of the conversion into consideration. This means that in the CTR mode, bitsliced AES is not only fast but also fully compatible with an existing implementation and moreover secure against cache timing attacks, since a bitsliced cipher does not use any lookup tables with key/data-dependent address.