A technical introduction to digital video
A technical introduction to digital video
Communications of the ACM
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation
Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation
Digital Image Compression Techniques
Digital Image Compression Techniques
Subword Extensions for Video Processing on Mobile Systems
IEEE Concurrency
VIS Speeds New Media Processing
IEEE Micro
Measuring the Performance of Multimedia Instruction Sets
IEEE Transactions on Computers
Performance Evaluation of Two Emerging Media Processors: VIRAM and Imagine
IPDPS '03 Proceedings of the 17th International Symposium on Parallel and Distributed Processing
Vectorizing for a SIMdD DSP architecture
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
An Architecture for Motion Estimation in the Transform Domain
VLSID '04 Proceedings of the 17th International Conference on VLSI Design
An innovative low-power high-performance programmable signal processor for digital communications
IBM Journal of Research and Development
On the Euclidean Distance of Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Avoiding conversion and rearrangement overhead in SIMD architectures
International Journal of Parallel Programming
Limitations of special-purpose instructions for similarity measurements in media SIMD extensions
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
Accelerating Color Space Conversion Using Extended Subwords and the Matrix Register File
ISM '06 Proceedings of the Eighth IEEE International Symposium on Multimedia
Video Data Management and Information Retrieval
Video Data Management and Information Retrieval
Hardware Support for Arithmetic Units of Processor with Multimedia Extension
MUE '07 Proceedings of the 2007 International Conference on Multimedia and Ubiquitous Engineering
Accelerating colour space conversion on reconfigurable hardware
Image and Vision Computing
Heterogeneous video transcoding to lower spatio-temporalresolutions and different encoding formats
IEEE Transactions on Multimedia
Performance Improvement of Multimedia Kernels by Alleviating Overhead Instructions on SIMD Devices
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Color-Aware Instructions for Embedded Superscalar Processors
Journal of Signal Processing Systems
Hi-index | 0.00 |
Extended subwords and the matrix register file (MRF) are two micro architectural techniques that address some of the limitations of existing SIMD architectures. Extended subwords are wider than the data stored in memory. Specifically, for every byte of data stored in memory, there are four extra bits in the media register file. This avoids the need for data-type conversion instructions. The MRF is a register file organization that provides both conventional row-wise, as well as column-wise, access to the register file. In other words, it allows to view the register file as a matrix in which corresponding subwords in different registers corresponds to a column of the matrix. It was introduced to accelerate matrix transposition which is a very common operation in multimedia applications. In this paper, we show that the MRF is very versatile, since it can also be used for other permutations than matrix transposition. Specifically, it is shown how it can be used to provide efficient access to strided data, as is needed in, e.g., color space conversion. Furthermore, it is shown that special-purpose instructions (SPIs), such as the sum-of-absolute differences (SAD) instruction, have limited usefulness when extended subwords and a few general SIMD instructions that we propose are supported, for the following reasons. First, when extended subwords are supported, the SAD instruction provides only a relatively small performance improvement. Second, the SAD instruction processes 8-bit subwords only, which is not sufficient for quarter-pixel resolution nor for cost functions used in image and video retrieval. Results obtained by extending the SimpleScalar toolset show that the proposed techniques provide a speedup of up to 3.00 over the MMX architecture. The results also show that using, at most, 13 extra media registers yields an additional performance improvement ranging from 1.38 to 1.57.