Limitations of special-purpose instructions for similarity measurements in media SIMD extensions

  • Authors:
  • Asadollah Shahbahrami;Ben Juurlink;Stamatis Vassiliadis

  • Affiliations:
  • Delft University of Technology, The Netherlands;Delft University of Technology, The Netherlands;Delft University of Technology, The Netherlands

  • Venue:
  • CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Microprocessor vendors have provided special-purpose instructions such as psadbw and pdist to accelerate the sum-of-absolute differences (SAD) similarity measurement. The usefulness of these special-purpose instructions is limited except for the motion estimation kernel. This has several drawbacks. First, if the SAD becomes obsolete because a different similarity metric is going to be employed, then those special-purpose instructions are no longer useful. Second, these special instructions process 8-bit subwords only. This precision is not su cient for some kernels such as motion estimation in the transform domain. In addition, when employing other n-way parallel SIMD instructions to implement the SAD and sum-of-squared differences (SSD),the obtained speedup is much less than n. This is because there is a mismatch between the storage and the computational format. In this paper, we design and evaluate a variety of SIMD instructions for different data types. We synthesize special-purpose instructions using a few general-purpose SIMD instructions. In addition, we employ the extended subwords technique to avoid conversion overhead and to increase parallelism. In this technique there are four extra bits for every byte of register. The results show that using different SIMD instructions and extended subwords achieve a speedup ranging from 10.39 to 14.57 over C performance for SAD, SSD with interpolation, and SSD functions in the motion estimation kernel. While, MMX achieves a speedup ranging from 4.61 to 7.42. Additionally,the proposed SIMD instructions improve the performance of similarity measurement for image histograms by a factor ranging from 8.69 (1-way)to 11.70 (4-way) over C.While for MMX speedup is between 2.90 (1-way) and 4.33 (4-way).