Principles of CMOS VLSI design: a systems perspective
Principles of CMOS VLSI design: a systems perspective
Computer
The Twofish encryption algorithm: a 128-bit block cipher
The Twofish encryption algorithm: a 128-bit block cipher
Architectural support for fast symmetric-key cryptography
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Cryptography: Theory and Practice
Cryptography: Theory and Practice
VIS Speeds New Media Processing
IEEE Micro
Subword Parallelism with MAX-2
IEEE Micro
Bit Permutation Instructions for Accelerating Software Cryptography
ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Fast Subword Permutation Instructions Using Omega and Flip Network Stages
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors
Journal of Signal Processing Systems
A customized cross-bar for data-shuffling in domain-specific simd processors
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
A vector approach to cryptography implementation
DRMTICS'05 Proceedings of the First international conference on Digital Rights Management: technologies, Issues, Challenges and Systems
Semi custom design: a case study on SIMD shufflers
PATMOS'07 Proceedings of the 17th international conference on Integrated Circuit and System Design: power and timing modeling, optimization and simulation
Synthesis and optimization of reversible circuits—a survey
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
We propose two new instructions, swperm and sieve, that can be used to efficiently complete an arbitrary bit-level permutation of an n-bit word with or without repetitions. Permutations with repetitions are rearrangements of an ordered set in which elements may replace other elements in the set; such permutations are useful in cryptographic algorithms. On a four-way superscalar processor, we can complete an arbitrary 64-bit permutation with repetitions of 1-bit subwords in 11 instructions and only four cycles using the two proposed instructions. For subwords of size 4 bits or greater, we can perform an arbitrary permutation with repetitions of a 64-bit register in a single cycle using a single swperm instruction. This improves upon previous results by requiring fewer instructions to permute 4-bit or larger subwords packed in a 64-bit register and fewer execution cycles for 1-bit subwords on wide superscalar processors. We also demonstrate that we can accelerate the performance of the popular DES block cipher using the proposed instructions. We obtain a DES performance improvement of at least 55% in constrained embedded environments and an improvement of 71% on a four-way superscalar processor when applying DES as a cryptographic hash function.