FFTs in external or hierarchical memory
The Journal of Supercomputing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
Real and complex fast Fourier transforms on the Fujitsu VPP 500
Parallel Computing
MPI versus MPI+OpenMP on IBM SP for the NAS benchmarks
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
High Performance Communication using a Commodity Network for Cluster Systems
HPDC '00 Proceedings of the 9th IEEE International Symposium on High Performance Distributed Computing
FFT algorithms for vector computers
Parallel Computing
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
On non-blocking collectives in 3D FFTs
Proceedings of the second workshop on Scalable algorithms for large-scale systems
Hi-index | 0.00 |
In the present paper, we propose a hybrid MPI/OpenMP implementation of a parallel three-dimensional fast Fourier transform (FFT) algorithm on SMP clusters. The three-dimensional FFT algorithm can be altered to create a block three-dimensional FFT algorithm in order to reduce the number of cache misses. We then use the obtained block three-dimensional FFT algorithm to implement the parallel three-dimensional FFT. We succeeded in obtaining a performance of over 14 GFLOPS on the AIST Super Cluster M-64 (using 32 nodes out of 132 available, Itanium2 1.3 GHz, 4-way SMP).