A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton

  • Authors:
  • Cliff Young;Joseph A. Bank;Ron O. Dror;J. P. Grossman;John K. Salmon;David E. Shaw

  • Affiliations:
  • D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY;D. E. Shaw Research, New York, NY

  • Venue:
  • Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

Anton, a massively parallel special-purpose machine for molecular dynamics simulations, performs a 32x32x32 FFT in 3.7 microseconds and a 64x64x64 FFT in 13.3 microseconds on a configuration with 512 nodes---an order of magnitude faster than all other FFT implementations of which we are aware. Achieving this FFT performance requires a coordinated combination of computation and communication techniques that leverage Anton's underlying hardware mechanisms. Most significantly, Anton's communication subsystem provides over 300 gigabits per second of bandwidth per node, message latency in the hundreds of nanoseconds, and support for word-level writes and single-ended communication. In addition, Anton's general-purpose computation system incorporates primitives that support the efficient parallelization of small 1D FFTs. Although Anton was designed specifically for molecular dynamics simulations, a number of the hardware primitives and software implementation techniques described in this paper may also be applicable to the acceleration of FFTs on general-purpose high-performance machines.