Performance analysis of the FFT algorithm on a shared-memory parallel architecture
IBM Journal of Research and Development
FFTs in external or hierarchical memory
The Journal of Supercomputing
A fast Fourier transform compiler
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Parallel Implementation of Multidimensional Transforms without Interprocessor Communication
IEEE Transactions on Computers
A comparison of optimal FFTs on torus and hypercube multicomputers
Parallel Computing
A Parallel Algorithm for 2-D DFT Computation with No Interprocessor Communication
IEEE Transactions on Parallel and Distributed Systems
The Scalability of FFT on Parallel Computers
IEEE Transactions on Parallel and Distributed Systems
Message Passing Vs. Shared Address Space on a Clusters of SMPs
IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
The Fastest Fourier Transform in the West
The Fastest Fourier Transform in the West
A parallel 1-D FFT algorithm for the Hitachi SR8000
Parallel Computing
IEEE Transactions on Signal Processing
A parallel implementation of the 2-D discrete wavelet transformwithout interprocessor communications
IEEE Transactions on Signal Processing
A framework for low-communication 1-D FFT
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for low-communication 1-D FFT
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
Computing 1-D Fast Fourier Transform (FFT) using the conventional 4-step FFT on parallel computers requires intensive all-to-all communication, which can adversely affect the performance of FFT. In this paper, we present 2-step-no-communication and 3-step-no-communication algorithms, which are parallel algorithms for 1-D FFT without interprocessor communication. One of the main advantages of these algorithms is the absence of all-to-all communication between processors, albeit at the expense of increased computation compared to the conventional 4-step FFT. If the cost of extra computation required by the 2-step-no-communication and the 3-step-no-communication algorithms is more than offset by the cost of all-to-all communication in the 4-step FFT, then these two no-communication algorithms will outperform the 4-step FFT algorithm. We test the 2-step-no-communication and the 3-step-no-communication algorithms in two parallel systems (a 32-node Beowulf cluster and 8-node symmetric multiprocessors), with varying costs of all-to-all communication and computation. The experimental results show that the no-communication algorithms perform better than the 4-step FFT in the SMP only for relatively small data sizes, but the no-communication algorithms outperform the 4-step FFT in the Beowulf cluster for all data sizes tested.