The Key and IV Setup of the Stream Ciphers HC-256 and HC-128
NSWCTC '09 Proceedings of the 2009 International Conference on Networks Security, Wireless Communications and Trusted Computing - Volume 02
Design of a parallel AES for graphics hardware using the CUDA framework
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
A Cache Timing Analysis of HC-256
Selected Areas in Cryptography
Improved Distinguishing Attacks on HC-256
IWSEC '09 Proceedings of the 4th International Workshop on Security: Advances in Information and Computer Security
A Program Behavior Study of Block Cryptography Algorithms on GPGPU
FCST '09 Proceedings of the 2009 Fourth International Conference on Frontier of Computer Science and Technology
On the importance of checking cryptographic protocols for faults
EUROCRYPT'97 Proceedings of the 16th annual international conference on Theory and application of cryptographic techniques
Designs, Codes and Cryptography
CudaDMA: optimizing GPU memory bandwidth via warp specialization
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
A theoretical analysis of the structure of HC-128
IWSEC'11 Proceedings of the 6th International conference on Advances in information and computer security
Cache attacks and countermeasures: the case of AES
CT-RSA'06 Proceedings of the 2006 The Cryptographers' Track at the RSA conference on Topics in Cryptology
Improved distinguishers for HC-128
Designs, Codes and Cryptography
Differential fault analysis of HC-128
AFRICACRYPT'10 Proceedings of the Third international conference on Cryptology in Africa
Hi-index | 0.00 |
The ease of programming offered by the CUDA programming model attracted a lot of programmers to try the platform for acceleration of many non-graphics applications. Cryptography, being no exception, also found its share of exploration efforts, especially block ciphers. In this contribution we present a detailed walk-through of effective mapping of HC-128 and HC-256 stream ciphers on GPUs. Due to inherent inter-S-Box dependencies, intra-S-Box dependencies and a high number of memory accesses per keystream word generation, parallelization of HC series of stream ciphers remains challenging. For the first time, we present various optimization strategies for HC-128 and HC-256 speedup in tune with CUDA device architecture. The peak performance achieved with a single data-stream for HC-128 and HC-256 is 0.95 Gbps and 0.41 Gbps respectively. Although these throughput figures do not beat the CPU performance (10.9 Gbps for HC-128 and 7.5 Gbps for HC-256), our multiple parallel data-stream implementation is benchmarked to reach approximately 31 Gbps for HC-128 and 14 Gbps for HC-256 (with 32768 parallel data-streams). To the best of our knowledge, this is the first reported effort of mapping HC-Series of stream ciphers on GPUs.