Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware

  • Authors:
  • Hoseok Chang;Wonyong Sung

  • Affiliations:
  • Seoul National University, Seoul, South Korea;Seoul National University, Seoul, South Korea

  • Venue:
  • CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Automatic vectorization of programs for partitioned-ALU SIMD (Single Instruction Multiple Data) processors has been difficult because of not only data dependency issues but also non-aligned and irregular data access problems. A non-aligned or irregular data access operation incurs many overhead cycles for data alignment. Moreover, this causes difficulty in efficient code generation and hinders automatic vectorization. In this paper, we employ special memory access hardware for improving the performance of SIMD processors; one is the split line buffer and the other is the packing buffer. The former solves the non-aligned memory access problem, while the latter simplifies irregular and stride data access. The addition of these hardware units not only requires very small changes to the instruction set architecture but also contributes to the significant performance improvement by vectorizing more loops and reducing the overhead cycles. We have also developed an auto-vectorization compiler which utilizes these special hardware units. Experiments have been conducted to compare the proposed method with the conventional one, which show 50% increase in the number of vectorized loops and 77% increase in the total performance of an MPEG2 encoder program.