Accelerating private-key cryptography via multithreading on symmetric multiprocessors

  • Authors:
  • P. Dongara;T. N. Vijaykumar

  • Affiliations:
  • Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA;Sch. of Electr. & Comput. Eng., Purdue Univ., West Lafayette, IN, USA

  • Venue:
  • ISPASS '03 Proceedings of the 2003 IEEE International Symposium on Performance Analysis of Systems and Software
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Achieving high performance in cryptographic processing is important due to the increasing connectivity among today's computers. Despite steady improvements in microprocessor and system performance, private-key cipher implementations continue to be slow. Irrespective of the cipher used, the main reason for the low performance is lack of parallelism, which fundamentally comes from encryption modes such as the Cipher Block Chaining (CBC) mode. In CBC, each plaintext block is XOR'ed with the previous ciphertext block and then encrypted, essentially inducing a tight recurrence through the ciphertext blocks. To deliver high performance while maintaining high level of security assurance in real systems, the cryptography community has proposed Interleaved Cipher Block Chaining (ICBC) mode. In four-way interleaved chaining, the first, fifth, and every fourth block thereafter are encrypted in CBC mode; the second, sixth, and every fourth block thereafter are encrypted as another stream, and so on. Thus, interleaved chaining loosens the recurrence imposed by CBC, enabling the multiple encryption streams to be overlapped. The number of interleaved chains can be chosen to balance performance and adequate chaining to get good data diffusion. While ICBC was originally proposed to improve hardware encryption rates by employing multiple encryption chips in parallel, this is the first paper to evaluate ICBC via multithreading commonly-used ciphers on a symmetric multiprocessor (SMP). ICBC allows exploiting the full processing power of SMPs, which spend many cycles in cryptographic processing as medium-scale servers today, and will do so as chip-multiprocessor clients in the future. Using the Wisconsin Wind Tunnel II, we show that our multithreaded ciphers achieve encryption rates of 92 Mbytes/s on a 16-processor SMP at 1 GHz, reaching a factor of almost 10 improvement oiler a uniprocessor, which achieves 9 Mbytes/s.