Communications of the ACM
Exploiting a new level of DLP in multimedia applications
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Instruction Set Extensions for MPEG-4 Video
Journal of VLSI Signal Processing Systems - Special issue on implementation of MPEG-4 multimedia codecs
Reconfigurable media processing
Parallel Computing - Parallel computing in image and video processing
Subword Parallelism with MAX-2
IEEE Micro
Measuring the Performance of Multimedia Instruction Sets
IEEE Transactions on Computers
Linear-time Matrix Transpose Algorithms Using Vector Register File With Diagonal Registers
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Design and characterization of the Berkeley multimedia workload
Multimedia Systems
ASAP '00 Proceedings of the IEEE International Conference on Application-Specific Systems, Architectures, and Processors
A Register File with Transposed Access Mode
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Efficient orchestration of sub-word parallelism in media processors
Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
The CSI multimedia architecture
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Avoiding data conversions in embedded media processors
Proceedings of the 2005 ACM symposium on Applied computing
Avoiding conversion and rearrangement overhead in SIMD architectures
International Journal of Parallel Programming
Limitations of special-purpose instructions for similarity measurements in media SIMD extensions
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Scalability evaluation of a polymorphic register file: A CG case study
ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
Separable 2d convolution with polymorphic register files
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Hi-index | 0.00 |
In this paper we employ two techniques suitable for embedded media processors. The first technique, extended subwords, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead because of mismatch between storage and computational formats. The second technique, the Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many block-based multimedia kernels such as (I)DCT, 2x2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. We employ Modified MMX (MMMX), MMX with extended subwords, to evaluate these techniques. Our results show that MMMX combined with an MRF reduces the dynamic number of instructions by up to 80% compared to other multimedia extensions such as MMX