On computing the fast Fourier transform
Communications of the ACM
Spiral: A Generator for Platform-Adapted Libraries of Signal Processing Algorithms
International Journal of High Performance Computing Applications
Scalable algorithms for molecular dynamics simulations on commodity clusters
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Anton, a special-purpose machine for molecular dynamics simulation
Proceedings of the 34th annual international symposium on Computer architecture
Bandwidth intensive 3-D FFT kernel for GPUs using CUDA
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Millisecond-scale molecular dynamics simulations on Anton
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Early evaluation of the cray XT3
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Performance measurements of the 3D FFT on the blue gene/l supercomputer
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Millisecond-scale molecular dynamics simulations on Anton
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Millisecond-scale molecular dynamics simulations on Anton
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
On the communication complexity of 3D FFTs and its implications for Exascale
Proceedings of the 26th ACM international conference on Supercomputing
Optimizing fine-grained communication in a biomolecular simulation application on Cray XK6
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Aspen: a domain specific language for performance modeling
SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
pvFPGA: accessing an FPGA-based hardware accelerator in a paravirtualized environment
Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
Hi-index | 0.00 |
Anton, a massively parallel special-purpose machine for molecular dynamics simulations, performs a 32x32x32 FFT in 3.7 microseconds and a 64x64x64 FFT in 13.3 microseconds on a configuration with 512 nodes---an order of magnitude faster than all other FFT implementations of which we are aware. Achieving this FFT performance requires a coordinated combination of computation and communication techniques that leverage Anton's underlying hardware mechanisms. Most significantly, Anton's communication subsystem provides over 300 gigabits per second of bandwidth per node, message latency in the hundreds of nanoseconds, and support for word-level writes and single-ended communication. In addition, Anton's general-purpose computation system incorporates primitives that support the efficient parallelization of small 1D FFTs. Although Anton was designed specifically for molecular dynamics simulations, a number of the hardware primitives and software implementation techniques described in this paper may also be applicable to the acceleration of FFTs on general-purpose high-performance machines.