Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime

Authors:
Chao Mei;Yanhua Sun;Gengbin Zheng;Eric J. Bohm;Laxmikant V. Kale;James C. Phillips;Chris Harrison
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2011

Citing 13
Cited 4

A Portable Programming Interface for Performance Evaluation on Modern Processors

International Journal of High Performance Computing Applications
Scalable algorithms for molecular dynamics simulations on commodity clusters

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Blue matter: approaching the limits of concurrency for classical molecular dynamics

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Development of mixed mode MPI / OpenMP applications

Scientific Programming
On the Programming Impact ofMulti-Core,Multi-Processor Nodes inMPI Clusters

HPCS '07 Proceedings of the 21st International Symposium on High Performance Computing Systems and Applications
The deep computing messaging framework: generalized scalable message passing on the blue gene/P supercomputer

Proceedings of the 22nd annual international conference on Supercomputing
Fine-grained parallelization of the Car-Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer

IBM Journal of Research and Development
Architectural implications for spatial object association algorithms

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Scaling applications to massively parallel machines using Projections performance analysis tool

Future Generation Computer Systems
Optimizing a parallel runtime system for multicore clusters: a case study

Proceedings of the 2010 TeraGrid Conference
Application of a hybrid MPI/OpenMP approach for parallel groundwater model calibration using multi-core computers

Computers & Geosciences
Understanding Application Performance via Micro-benchmarks on Three Large Supercomputers: Intrepid, Ranger and Jaguar

International Journal of High Performance Computing Applications
Periodic hierarchical load balancing for large supercomputers

International Journal of High Performance Computing Applications

Billion-particle SIMD-friendly two-point correlation on large-scale HPC cluster systems

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Optimizing fine-grained communication in a biomolecular simulation application on Cray XK6

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systems

Proceedings of the 27th international ACM conference on International conference on supercomputing
A distributed dynamic load balancer for iterative applications

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

A 100-million-atom biomolecular simulation with NAMD is one of the three benchmarks for the NSF-funded sustainable petascale machine. Simulating this large molecular system on a petascale machine presents great challenges, including handling I/O, large memory footprint and getting good strong-scaling results. In this paper, we present parallel I/O techniques to enable the simulation. A new SMP model is designed to efficiently utilize ubiquitous wide multicore clusters by extending the Charm++ asynchronous message-driven runtime. We exploit node-aware techniques to optimize both the application and the underlying SMP runtime. Hierarchical load balancing is further exploited to scale NAMD to the full Jaguar PF Cray XT5 (224,076 cores) at Oak Ridge National Laboratory, both with and without PME full electrostatics, achieving 93% parallel efficiency (vs 6720 cores) at 9 ms per step for a simple cutoff calculation. Excellent scaling is also obtained on 65,536 cores of the Intrepid Blue Gene/P at Argonne National Laboratory.