The IBM Parallel Engineering and Scientific Subroutine Library
PARA '95 Proceedings of the Second International Workshop on Applied Parallel Computing, Computations in Physics, Chemistry and Engineering Science
The development and integration of a distributed 3D FFT for a cluster of workstations
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Blue matter: scaling of N-body simulations to one atom per node
IBM Journal of Research and Development
Overview of the Blue Gene/L system architecture
IBM Journal of Research and Development
IBM Journal of Research and Development
Parallel Computing: Architectures, Algorithms and Applications - Volume 15 Advances in Parallel Computing
The Importance of Non-Data-Communication Overheads in MPI
International Journal of High Performance Computing Applications
Mapping communication layouts to network hardware characteristics on massive-scale blue gene systems
Computer Science - Research and Development
On the communication complexity of 3D FFTs and its implications for Exascale
Proceedings of the 26th ACM international conference on Supercomputing
FFTs and multiple collective communication on multiprocessor-node architectures
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
Hi-index | 0.00 |
Parallel 3D FFT is a commonly used numerical method inscientific computing. P3DFFT is a recently proposed implementationof parallel 3D FFT that is designed to allow scalability to massivelylarge systems such as Blue Gene. While there has been recent workthat demonstrates such scalability on regular cartesian meshes (equallength in each dimension), its performance and scalability for flat cartesianmeshes (much smaller length in one dimension) is still a concern. Inthis paper, we perform studies on a 16-rack (16384-node) Blue Gene/Lsystem that demonstrates that a combination of the network topologyand the communication pattern of P3DFFT can result in early networksaturation and consequently performance loss. We also show that remappingprocesses on nodes and rotating the mesh by taking the communicationproperties of P3DFFT into consideration, can help alleviate thisproblem and improve performance by up to 48% in some special cases.