Measuring the Performance of Multimedia Instruction Sets
IEEE Transactions on Computers
Fast Subword Permutation Instructions Using Omega and Flip Network Stages
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Scalable Parallel Memory Architectures for Video Coding
Journal of VLSI Signal Processing Systems
PLX: An Instruction Set Architecture and Testbed for Multimedia Information Processing
Journal of VLSI Signal Processing Systems
Matrix register file and extended subwords: two techniques for embedded media processors
Proceedings of the 2nd conference on Computing frontiers
A Low-Power Multithreaded Processor for Software Defined Radio
Journal of VLSI Signal Processing Systems
Avoiding conversion and rearrangement overhead in SIMD architectures
International Journal of Parallel Programming
Accelerating the Whirlpool Hash Function Using Parallel Table Lookup and Fast Cyclical Permutation
Fast Software Encryption
Fast Bit Gather, Bit Scatter and Bit Permutation Instructions for Commodity Microprocessors
Journal of Signal Processing Systems
An Enhanced DMA Controller in SIMD Processors for Video Applications
ARCS '09 Proceedings of the 22nd International Conference on Architecture of Computing Systems
Hi-index | 0.00 |
MicroSIMD architectures incorporating subword parallelism are very efficient for application-specific media processors as well as for fast multimedia information processing in general-purpose processors. This paper addresses the unsolved problem of the need to permute the subwords packed in registers for maximum parallelism performance, especially for two-dimensional (2-D) multimedia algorithms. We propose a new systematic approach for identifying the fundamental data rearrangement needs in current and future 2-D pixel processing programs based on the hierarchical decomposition of frames and objects into atomic 2-D structures. We define new subword permutation instructions, Check, Excheck, Exchange, and Permset that achieve these data rearrangements across multiple registers. We also define an alphabet of subword permutation primitives, including these new instructions and the Mix instruction defined for PA-RISC MAX-2 and IA-64, which supports the data rearrangement needs of 2-D frames and objects. We show the sufficiency and efficiency of this alphabet for achieving all possible permutations of hierarchical 2-D blocks.