FFTs in external or hierarchical memory
The Journal of Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
An implementation of multiple and multivariate Fourier transforms on vector processors
SIAM Journal on Scientific Computing
Real and complex fast Fourier transforms on the Fujitsu VPP 500
Parallel Computing
SIAM Journal on Scientific Computing
An Adaptation of the Fast Fourier Transform for Parallel Processing
Journal of the ACM (JACM)
CP-PACS: a massively parallel processor at the University of Tsukuba
Parallel Computing - Special Anniversary issue
A Superscalar RISC Processor with 160 FPRs for Large Scale Scientific Processing
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
The Fastest Fourier Transform in the West
The Fastest Fourier Transform in the West
Five-step FFT algorithm with reduced computational complexity
Information Processing Letters
Parallel implementations of 1-D fast Fourier transform without interprocessor communication
International Journal of Computers and Applications
A framework for low-communication 1-D FFT
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Tera-scale 1D FFT with low-communication algorithm and Intel® Xeon Phi™ coprocessors
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A framework for low-communication 1-D FFT
Scientific Programming - Selected Papers from Super Computing 2012
Hi-index | 0.00 |
In this paper, we propose a high-performance parallel one-dimensional fast Fourier transform (FFT) algorithm on clusters of vector symmetric multiprocessor (SMP) nodes. The four-step FFT algorithm can be altered into a five-step FFT algorithm to expand the innermost loop length. We use the five-step algorithm to implement the parallel one-dimensional FFT algorithm. In our proposed parallel FFT algorithm, since we use cyclic distribution, all-to-all communication takes place only once. Moreover, the input data and output data are both in natural order. Performance results of one-dimensional power-of-two FFTs on clusters of pseudo-vector SMP nodes, Hitachi SR8000, are reported. We succeeded in obtaining performance of over 61 GFLOPS on a 16-node Hitachi SR8000/MPP.