Generating SIMD vectorized permutations

  • Authors:
  • Franz Franchetti;Markus Püschel

  • Affiliations:
  • Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA;Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA

  • Venue:
  • CC'08/ETAPS'08 Proceedings of the Joint European Conferences on Theory and Practice of Software 17th international conference on Compiler construction
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper introduces a method to generate efficient vectorized implementations of small stride permutations using only vector load and vector shuffle instructions. These permutations are crucial for highperformance numerical kernels including the fast Fourier transform. Our generator takes as input only the specification of the target platform's SIMD vector ISA and the desired permutation. The basic idea underlying our generator is to model vector instructions as matrices and sequences of vector instructions as matrix formulas using the Kronecker product formalism. We design a rewriting system and a search mechanism that applies matrix identities to generate those matrix formulas that have vector structure and minimize a cost measure that we define. The formula is then translated into the actual vector program for the specified permutation. For three important classes of permutations, we show that our method yields a solution with the minimal number of vector shuffles. Inserting into a fast Fourier transform yields a significant speedup.