Two and three dimensional FFTS on highly parallel computers
Parallel Computing
Computational frameworks for the fast Fourier transform
Computational frameworks for the fast Fourier transform
IBM Journal of Research and Development
A hybrid MPI/OpenMP implementation of a parallel 3-d FFT on SMP clusters
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
The implementation of regional atmospheric model numerical algorithms for CBEA-based clusters
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
On non-blocking collectives in 3D FFTs
Proceedings of the second workshop on Scalable algorithms for large-scale systems
FFTs and multiple collective communication on multiprocessor-node architectures
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Scalable multi-GPU 3-D FFT for TSUBAME 2.0 supercomputer
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Designing and auto-tuning parallel 3-D FFT for computation-communication overlap
Proceedings of the 19th ACM SIGPLAN symposium on Principles and practice of parallel programming
Hi-index | 0.00 |
In this paper, we propose an implementation of a parallel three-dimensional fast Fourier transform (FFT) with two-dimensional decomposition on a massively parallel cluster of multi-core processors. The proposed parallel three-dimensional FFT algorithm is based on the multicolumn FFT algorithm. We show that a two-dimensional decomposition effectively improves performance by reducing the communication time for larger numbers of MPI processes. We successfully achieved a performance of over 401 GFlops on 256 nodes of Appro Xtreme-X3 (648 nodes, 147.2 GFlops/node, 95.4 TFlops peak performance) for 2563-point FFT.