FFTs in external or hierarchical memory
The Journal of Supercomputing
High-performance FFT algorithms for the Convex C4/XA supercomputer
The Journal of Supercomputing - Special issue: trends in parallel operating systems
SIAM Journal on Scientific Computing
A Blocking Algorithm for FFT on Cache-Based Processors
HPCN Europe 2001 Proceedings of the 9th International Conference on High-Performance Computing and Networking
A Blocking Algorithm for Parallel 1-D FFT on Shared-Memory Parallel Computers
PARA '02 Proceedings of the 6th International Conference on Applied Parallel Computing Advanced Scientific Computing
A Blocking Algorithm for Parallel 1-D FFT on Clusters of PCs
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
An OpenMP implementation of parallel FFT and its performance on IA-64 processors
WOMPAT'03 Proceedings of the OpenMP applications and tools 2003 international conference on OpenMP shared memory parallel programming
Gather/scatter hardware support for accelerating Fast Fourier Transform
Journal of Systems Architecture: the EUROMICRO Journal
Hi-index | 0.00 |
Computing one-dimensional fast Fourier transforms (FFTs) on microprocessors requires different algorithms, depending on whether the problem fits in the data cache. This paper describes efficient algorithms for both cases. Some implementations of out-of-cache one-dimensional FFTs use a six-step approach to reduce the number of cache misses. The six-step approach may be altered into a seven-step approach that allows increased data cache reuse. A natural parallelism is also developed. Performance results using these techniques are given for the Hewlett-Packard HP 9000 V-Class V2250 server.