Improved twiddle access for fast fourier transforms

  • Authors:
  • Kevin J. Bowers;Ross A. Lippert;Ron O. Dror;David E. Shaw

  • Affiliations:
  • D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY and Center for Computational Biology and Bioinformatics, Columbia University, New York, NY

  • Venue:
  • IEEE Transactions on Signal Processing
  • Year:
  • 2010

Quantified Score

Hi-index 35.68

Visualization

Abstract

Optimizing the number of arithmetic operations required in fast Fourier transform (FFT) algorithms has been the focus of extensive research, but memory management is of comparable importance on modern processors. In this article, we investigate two known FFT algorithms, G and GT, that are similar to Cooley-Tukey decimation-in-time and decimation-infrequency FFT algorithms but that give an asymptotic reduction in the number of twiddle factor loads required for depth-first recursions. The algorithms also allow for aggressive vectorization (even for non-power-of-2 orders) and easier optimization of trivial twiddle factor multiplies. We benchmark G and GT implementations with comparable Cooley-Tukey implementations on commodity hardware. In a comparison designed to isolate the effect of twiddle factor access optimization, these benchmarks show typical speedups ranging from 10% to 65%, depending on transform order, precision, and vectorization. A more heavily optimized implementation of GT yields substantial performance improvements over the widely used code FFTW for many transform orders. The twiddle factor access optimization technique can be generalized to other common FFT algorithms, including real-data FFTs, split-radix FFTs, and multidimensional FFTs.