International Journal of Computer Vision
Communications of the ACM
Exploiting superword level parallelism with multimedia instruction sets
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation
Algorithms, Complexity Analysis and VLSI Architectures for MPEG-4 Motion Estimation
Digital Image Compression Techniques
Digital Image Compression Techniques
VIS Speeds New Media Processing
IEEE Micro
The Sum-Absolute-Difference Motion Estimation Accelerator
EUROMICRO '98 Proceedings of the 24th Conference on EUROMICRO - Volume 2
An Architecture for Motion Estimation in the Transform Domain
VLSID '04 Proceedings of the 17th International Conference on VLSI Design
Performance Evaluation of Block-Based Motion Estimation Algorithms and Distortion Measures
ITCC '04 Proceedings of the International Conference on Information Technology: Coding and Computing (ITCC'04) Volume 2 - Volume 2
Accelerating Mobile Multimedia with Intel Wireless MMX" Technology
ISMSE '04 Proceedings of the IEEE Sixth International Symposium on Multimedia Software Engineering
Matrix register file and extended subwords: two techniques for embedded media processors
Proceedings of the 2nd conference on Computing frontiers
Avoiding data conversions in embedded media processors
Proceedings of the 2005 ACM symposium on Applied computing
On the Euclidean Distance of Images
IEEE Transactions on Pattern Analysis and Machine Intelligence
Instruction Set Architecture Enhancements for Video Processing
ASAP '05 Proceedings of the 2005 IEEE International Conference on Application-Specific Systems, Architecture Processors
Video Data Management and Information Retrieval
Video Data Management and Information Retrieval
Heterogeneous video transcoding to lower spatio-temporalresolutions and different encoding formats
IEEE Transactions on Multimedia
Versatility of extended subwords and the matrix register file
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Microprocessor vendors have provided special-purpose instructions such as psadbw and pdist to accelerate the sum-of-absolute differences (SAD) similarity measurement. The usefulness of these special-purpose instructions is limited except for the motion estimation kernel. This has several drawbacks. First, if the SAD becomes obsolete because a different similarity metric is going to be employed, then those special-purpose instructions are no longer useful. Second, these special instructions process 8-bit subwords only. This precision is not su cient for some kernels such as motion estimation in the transform domain. In addition, when employing other n-way parallel SIMD instructions to implement the SAD and sum-of-squared differences (SSD),the obtained speedup is much less than n. This is because there is a mismatch between the storage and the computational format. In this paper, we design and evaluate a variety of SIMD instructions for different data types. We synthesize special-purpose instructions using a few general-purpose SIMD instructions. In addition, we employ the extended subwords technique to avoid conversion overhead and to increase parallelism. In this technique there are four extra bits for every byte of register. The results show that using different SIMD instructions and extended subwords achieve a speedup ranging from 10.39 to 14.57 over C performance for SAD, SSD with interpolation, and SSD functions in the motion estimation kernel. While, MMX achieves a speedup ranging from 4.61 to 7.42. Additionally,the proposed SIMD instructions improve the performance of similarity measurement for image histograms by a factor ranging from 8.69 (1-way)to 11.70 (4-way) over C.While for MMX speedup is between 2.90 (1-way) and 4.33 (4-way).