Radix-4 FFT implementation using SIMD multimedia instructions

  • Authors:
  • K. Nadehara;T. Miyazaki;I. Kuroda

  • Affiliations:
  • C&CMedia Res. Labs., NEC Corp., Kawasaki, Japan;-;-

  • Venue:
  • ICASSP '99 Proceedings of the Acoustics, Speech, and Signal Processing, 1999. on 1999 IEEE International Conference - Volume 04
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

A fast radix-4 complex FFT implementation using 4-parallel SIMD instructions is presented. Four radix-4 butterflies are calculated in parallel at all stages by loading consecutive 4 elements into a register. At the last stage, every 4 elements is packed into a register and calculated in parallel. This regular data flow enables higher parallelism and an overhead reduction in data format conversion. The implementation result on the V830R processor, which has a 4-parallel SIMD-type multimedia instruction set, achieves practical performance quite competitive with high-end parallel DSPs. Multiply-accumulate instructions with symmetrical rounding introduced to the V830R processor are effective to maintain FFT accuracy.