Exploiting 162-Nanosecond End-to-End Communication Latency on Anton

  • Authors:
  • Ron O. Dror;J. P. Grossman;Kenneth M. Mackenzie;Brian Towles;Edmond Chow;John K. Salmon;Cliff Young;Joseph A. Bank;Brannon Batson;Martin M. Deneroff;Jeffrey S. Kuskin;Richard H. Larson;Mark A. Moraes;David E. Shaw

  • Affiliations:
  • -;-;-;-;-;-;-;-;-;-;-;-;-;-

  • Venue:
  • Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Strong scaling of scientific applications on parallel architectures is increasingly limited by communication latency. This paper describes the techniques used to mitigate latency in Anton, a massively parallel special-purpose machine that accelerates molecular dynamics (MD) simulations by orders of magnitude compared with the previous state of the art. Achieving this speedup required a combination of hardware mechanisms and software constructs to reduce network latency, sender and receiver overhead, and synchronization costs. Key elements of Anton's approach, in addition to tightly integrated communication hardware, include formulating data transfer in terms of counted remote writes, leveraging fine-grained communication, and establishing fixed, optimized communication patterns. Anton delivers software-to-software inter-node latency significantly lower than any other large-scale parallel machine, and the total critical-path communication time for an Anton MD simulation is less than 4% that of the next fastest MD platform.