FFTs in external or hierarchical memory
The Journal of Supercomputing
Using local memory to boost the performance of FFT algorithms on the CRAY-2 supercomputer
The Journal of Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
An Adaptation of the Fast Fourier Transform for Parallel Processing
Journal of the ACM (JACM)
Array Permutation by Index-Digit Permutation
Journal of the ACM (JACM)
FFT program generation for shared memory: SMP and multicore
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
High performance discrete Fourier transforms on graphics processors
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Fast Fourier Transforms: for fun and profit
AFIPS '66 (Fall) Proceedings of the November 7-10, 1966, fall joint computer conference
Computer generation of fast fourier transforms for the cell broadband engine
Proceedings of the 23rd international conference on Supercomputing
FFTC: fastest Fourier transform for the IBM cell broadband engine
HiPC'07 Proceedings of the 14th international conference on High performance computing
An empirically tuned 2D and 3D FFT library on CUDA GPU
Proceedings of the 24th ACM International Conference on Supercomputing
Some computer organizations and their effectiveness
IEEE Transactions on Computers
Using GPUs to compute large out-of-card FFTs
Proceedings of the international conference on Supercomputing
Hi-index | 0.00 |
A transpose-free in-place SIMD optimized algorithm for the computation of large FFTs is introduced and implemented on the Cell Broadband Engine. Six different FFT implementations of the algorithm using six different data movement methods are described. Their relative performance is compared for input sizes from 217 to 221 complex floating point samples. Large differences in performance are observed among even theoretically equivalent data movement patterns. All six implementations compare favorably with FFTW and other previous FFT implementations.