Computer
Pathlength reduction features in the PA-RISC architecture
COMPCON '92 Proceedings of the thirty-seventh international conference on COMPCON
Architectural support for fast symmetric-key cryptography
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Hacker's Delight
Compression of inverted indexes For fast query evaluation
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
MicroUnity's MediaProcessor Architecture
IEEE Micro
Subword Parallelism with MAX-2
IEEE Micro
Proceedings of the First International Workshop on Information Hiding
ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Bit Permutation Instructions for Accelerating Software Cryptography
ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
Fast Subword Permutation Instructions Using Omega and Flip Network Stages
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Architectural techniques for accelerating subword permutations with repetitions
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2001 international conference on computer design (ICCD)
On Permutation Operations in Cipher Design
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Fast Parallel Table Lookups to Accelerate Symmetric-Key Cryptography
ITCC '05 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'05) - Volume I - Volume 01
On-Chip Lookup Tables for Fast Symmetric-Key Encryption
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Fast Bit Compression and Expansion with Parallel Extract and Parallel Deposit Instructions
ASAP '06 Proceedings of the IEEE 17th International Conference on Application-specific Systems, Architectures and Processors
Run-time generation of partial FPGA configurations for subword operations
Microprocessors & Microsystems
Synthesis and optimization of reversible circuits—a survey
ACM Computing Surveys (CSUR)
Hi-index | 0.00 |
Advanced bit manipulation operations are not efficiently supported by commodity word-oriented microprocessors. Programming tricks are typically devised to shorten the long sequence of instructions needed to emulate these complicated bit operations. As these bit manipulation operations are relevant to applications that are becoming increasingly important, we propose direct support for them in microprocessors. In particular, we propose fast bit gather (or parallel extract), bit scatter (or parallel deposit) and bit permutation instructions (including group, butterfly and inverse butterfly). We show that all these instructions can be implemented efficiently using both the fast butterfly and inverse butterfly network datapaths. Specifically, we show that parallel deposit can be mapped onto a butterfly circuit and parallel extract can be mapped onto an inverse butterfly circuit. We define static, dynamic and loop invariant versions of the instructions, with static versions utilizing a much simpler functional unit. We show how a hardware decoder can be implemented for the dynamic and loop-invariant versions to generate, dynamically, the control signals for the butterfly and inverse butterfly datapaths. The simplest functional unit we propose is smaller and faster than an ALU. We also show that these instructions yield significant speedups over a basic RISC architecture for a variety of different application kernels taken from applications domains including bioinformatics, steganography, coding, compression and random number generation.