IBM Journal of Research and Development
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
A hybrid MPI/OpenMP implementation of a parallel 3-d FFT on SMP clusters
PPAM'05 Proceedings of the 6th international conference on Parallel Processing and Applied Mathematics
Hi-index | 0.00 |
With the inclusion of non-blocking global collective operations in the MPI 3.0 draft specification many fundamental algorithms such as those for performing 3-dimensional (3D) FFTs will be modified to take advantage of non-blocking collectives. Novel modifications to such fundamental algorithms will need to be suitable for incorporation in general-purpose FFT libraries to be routinely used by HPC application users. Here we present such a general-purpose algorithmic strategy to utilize non-blocking collective communications in the calculation of a single parallel 3D FFT. In this scheme, the global collective communication is partitioned into blocking and non-blocking components such that overlap between communication and computation is obtained in the 3D FFT calculation. We present benchmarks of our scheme for overlapping computation and communication in the calculation of single variable 3D FFTs on two different architectures (a) HECToR, a Cray XE6 machine and (b) a Fujitsu PRIMERGY Intel Westmere cluster with InfiniBand interconnect.