FFTs in external or hierarchical memory
The Journal of Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
Augmenting Loop Tiling with Data Alignment for Improved Cache Performance
IEEE Transactions on Computers - Special issue on cache memory and related problems
A Blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
High Performance FFT Algorithms for Cache-Coherent Multiprocessors
International Journal of High Performance Computing Applications
FFT program generation for shared memory: SMP and multicore
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Performance of parallel bit-reversal with cilk and UPC for fast fourier transform
GPC'10 Proceedings of the 5th international conference on Advances in Grid and Pervasive Computing
Hi-index | 0.01 |
In this paper, we propose an OpenMP implementation of a recursive algorithm for parallel fast Fourier transform (FFT) on shared-memory parallel computers. A recursive three-step FFT algorithm improves performance by effectively utilizing the cache memory. Performance results of one-dimensional FFTs on the DELL PowerEdge 7150 and the hp workstation zx6000 are reported. We successfully achieved performance of about 757MFLOPS on the DELL PowerEdge 7150 (Itanium 800MHz, 4CPUs) and about 871MFLOPS on the hp workstation zx6000 (Itanium2 1GHz, 2CPUs) for 224-point FFT.