The J-machine multicomputer: an architectural evaluation
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
Proceedings of the 28th annual international symposium on Microarchitecture
Synchronization and communication in the T3E multiprocessor
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
An implementation of the Hamlyn sender-managed interface architecture
OSDI '96 Proceedings of the second USENIX symposium on Operating systems design and implementation
Design and implementation of FMPL, a fast message-passing library for remote memory operations
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
Performance Evaluation of the Quadrics Interconnection Network
Cluster Computing
Ultra-high performance communication with MPI and the Sun fire™ link interconnect
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
An overview of the BlueGene/L Supercomputer
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Blue Gene: a vision for protein science using a petaflop supercomputer
IBM Systems Journal - Deep computing for the life sciences
QCDOC: A 10 Teraflops Computer for Tightly-Coupled Calculations
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Scalable Hardware-Based Multicast Trees
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Optimization of MPI collective communication on BlueGene/L systems
Proceedings of the 19th annual international conference on Supercomputing
An Application-Based Performance Characterization of the Columbia Supercluster
SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Scalable algorithms for molecular dynamics simulations on commodity clusters
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Blue matter: approaching the limits of concurrency for classical molecular dynamics
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Anton, a special-purpose machine for molecular dynamics simulation
Proceedings of the 34th annual international symposium on Computer architecture
General purpose molecular dynamics simulations fully implemented on graphics processing units
Journal of Computational Physics
High-performance ethernet-based communications for future multi-core processors
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Cray XT4: an early evaluation for petascale scientific simulation
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Adapting a message-driven parallel application to GPU-accelerated clusters
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Early evaluation of IBM BlueGene/P
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms
HOTI '09 Proceedings of the 2009 17th IEEE Symposium on High Performance Interconnects
A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Millisecond-scale molecular dynamics simulations on Anton
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Hardware support for fine-grained event-driven computation in Anton 2
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Hi-index | 0.00 |
Strong scaling of scientific applications on parallel architectures is increasingly limited by communication latency. This paper describes the techniques used to mitigate latency in Anton, a massively parallel special-purpose machine that accelerates molecular dynamics (MD) simulations by orders of magnitude compared with the previous state of the art. Achieving this speedup required a combination of hardware mechanisms and software constructs to reduce network latency, sender and receiver overhead, and synchronization costs. Key elements of Anton's approach, in addition to tightly integrated communication hardware, include formulating data transfer in terms of counted remote writes, leveraging fine-grained communication, and establishing fixed, optimized communication patterns. Anton delivers software-to-software inter-node latency significantly lower than any other large-scale parallel machine, and the total critical-path communication time for an Anton MD simulation is less than 4% that of the next fastest MD platform.