The development and integration of a distributed 3D FFT for a cluster of workstations

Authors:
Christopher E. Cramer;John A. Board
Affiliations:
Duke University;Duke University
Venue:
ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Year:
2000

Citing 0
Cited 8

Implementation and performance analysis of non-blocking collective operations for MPI

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
A study of the effects of machine geometry and mapping on distributed transpose performance

Proceedings of the 5th conference on Computing frontiers
Fine-grained parallelization of the Car-Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer

IBM Journal of Research and Development
Scalable framework for 3D FFTs on the Blue Gene/L supercomputer: implementation and early performance measurements

IBM Journal of Research and Development
Communication analysis of parallel 3D FFT for flat cartesian meshes on large Blue Gene systems

HiPC'08 Proceedings of the 15th international conference on High performance computing
Performance measurements of the 3D FFT on the blue gene/l supercomputer

Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
On the communication complexity of 3D FFTs and its implications for Exascale

Proceedings of the 26th ACM international conference on Supercomputing
Scalable multi-GPU 3-D FFT for TSUBAME 2.0 supercomputer

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, the authors discuss the steps taken in the formulation of a parallel 3D FFT with good scalability on a cluster of fast workstations connected via commodity 100 Mb/s ethernet. The motivation for this work is to improve the performance and scalability of the Distributed Particle Mesh Ewald (DPME) N-body solver. Scalability issues in the FFT and DPME as an application are presented separately. Also discussed are scalability issues related to the networking hardware used in the cluster. Results indicate that the existence of a parallel FFT significantly improves performance in DPME from a maximum of 5 processors to at least 24 processors on a cluster of workstations. This has an associated increase in speedup from 4 to 12 times faster than the serial version.