A SIMD Vectorizing Compiler for Digital Signal Processing Algorithms
IPDPS '02 Proceedings of the 16th International Parallel and Distributed Processing Symposium
A fast Fourier transform compiler
ACM SIGPLAN Notices - Best of PLDI 1979-1999
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
International Journal of High Performance Computing Applications
Vectorization techniques for the Blue Gene/L double FPU
IBM Journal of Research and Development
An implementation of parallel 1-D FFT using SSE3 instructions on dual-core processors
PARA'06 Proceedings of the 8th international conference on Applied parallel computing: state of the art in scientific computing
Automatically tuned FFTs for bluegene/l's double FPU
VECPAR'04 Proceedings of the 6th international conference on High Performance Computing for Computational Science
An implementation of parallel 3-d FFT using short vector SIMD instructions on clusters of PCs
PARA'04 Proceedings of the 7th international conference on Applied Parallel Computing: state of the Art in Scientific Computing
Hi-index | 0.00 |
This paper introduces an SIMD vectorization for FFTW-the "fastest Fourier transform in the west" proposed by Frigo and Johnson (see Proceedings of the ACM SIGPLAN '99 , p.169-180, 1999). The new method leads to an architecture independent short vector SIMD FFT vectorization that utilizes the architecture adaptivity of FFTW. It is based on special FFT kernels (up to size 64 and more) that are utilized by FFTW to compute the whole transform. This vectorization supports all features of complex transforms in FFTW (arbitrary size, dimension and stride of the data vector; in-place and out-of-place transforms) and is fully transparent to the user. It is suitable for arbitrary vector sizes of the underlying hardware.