Avoiding conversion and rearrangement overhead in SIMD architectures

  • Authors:
  • Asadollah Shahbahrami;Ben Juurlink;Demid Borodin;Stamatis Vassiliadis

  • Affiliations:
  • Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft Univ. of Technol., The Netherlands and Dept. of Elec. and Comp. Eng., Fac. of Eng., Gui ...;Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, The Netherlands;Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, The Netherlands;Computer Engineering Laboratory, Faculty of Electrical Engineering, Mathematics, and Computer Science, Delft University of Technology, The Netherlands

  • Venue:
  • International Journal of Parallel Programming
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

Single-Instruction Multiple-Data (SIMD) instructions provide an inexpensive way to exploit the Data-Level Parallelism in multimedia applications. However, the performance improvement obtained by employing SIMD instructions is often limited because frequently many overhead instructions are required to bring data in a form amenable to SIMD processing. In this paper, we employ two techniques to overcome this limitation. The first technique, extended sub-words, uses four extra bits for every byte in a media register. This allows many SIMD operations to be performed without overflow and avoids packing/unpacking conversion overhead. The second technique, Matrix Register File (MRF), allows flexible row-wise as well as column-wise access to the register file. It is useful for many two-dimensional multimedia algorithms such as the (I) Discrete Cosine Transform, 2 × 2 Haar Transform, and pixel padding. In addition, we propose a few new media instructions. Experimental results obtained by extending the SimpleScalar toolset show that these techniques improve performance by up to a factor of 4.5 compared to a conventional SIMD instruction set extension.