Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability

Authors:
J. N. Glosli;D. F. Richards;K. J. Caspersen;R. E. Rudd;J. A. Gunnels;F. H. Streitz
Affiliations:
Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA;Lawrence Livermore National Laboratory, Livermore, CA;IBM Corporation, Yorktown Heights, New York;Lawrence Livermore National Laboratory, Livermore, CA
Venue:
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Year:
2007

Citing 8
Cited 21

Computer simulation of liquids

Computer simulation of liquids
A survey of rollback-recovery protocols in message-passing systems

ACM Computing Surveys (CSUR)
The Art of Molecular Dynamics Simulation

The Art of Molecular Dynamics Simulation
Transient-fault recovery for chip multiprocessors

Proceedings of the 30th annual international symposium on Computer architecture
SWIFT: Software Implemented Fault Tolerance

Proceedings of the international symposium on Code generation and optimization
Design and Evaluation of Hybrid Fault-Detection Systems

Proceedings of the 32nd annual international symposium on Computer Architecture
A 55 TFLOPS simulation of amyloid-forming peptides from yeast prion Sup35 with the special-purpose computer system MDGRAPE-3

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Super-Scalable algorithms for computing on 100,000 processors

ICCS'05 Proceedings of the 5th international conference on Computational Science - Volume Part I

Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
Adapting a message-driven parallel application to GPU-accelerated clusters

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
369 Tflop/s molecular dynamics simulations on the Roadrunner general-purpose heterogeneous supercomputer

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Toward Exascale Resilience

International Journal of High Performance Computing Applications
Beyond homogeneous decomposition: scaling long-range forces on Massively Parallel Systems

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Vision for cross-layer optimization to address the dual challenges of energy and reliability

Proceedings of the Conference on Design, Automation and Test in Europe
Design techniques for cross-layer resilience

Proceedings of the Conference on Design, Automation and Test in Europe
Design, Modeling, and Evaluation of a Scalable Multi-level Checkpointing System

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Experiences with a Lightweight Supercomputer Kernel: Lessons Learned from Blue Gene's CNK

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Fault tolerant preconditioned conjugate gradient for sparse linear system solution

Proceedings of the 26th ACM international conference on Supercomputing
Fault resilience of the algebraic multi-grid solver

Proceedings of the 26th ACM international conference on Supercomputing
Quantifying the effectiveness of load balance algorithms

Proceedings of the 26th ACM international conference on Supercomputing
McrEngine: a scalable checkpointing system using data-aware aggregation and compression

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A 1 PB/s file system to checkpoint three million MPI tasks

Proceedings of the 22nd international symposium on High-performance parallel and distributed computing
Science at LLNL with IBM Blue Gene/Q

IBM Journal of Research and Development
Optimization of cloud task processing with checkpoint-restart mechanism

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
A scalable parallel algorithm for dynamic range-limited n-tuple computation in many-body molecular dynamics simulation

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Automated Algorithmic Error Resilience for Structured Grid Problems Based on Outlier Detection

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Analysis of scalable data-privatization threading algorithms for hybrid MPI/OpenMP parallelization of molecular dynamics

The Journal of Supercomputing
McrEngine: A scalable checkpointing system using data-aware aggregation and compression

Scientific Programming - Selected Papers from Super Computing 2012

Quantified Score

Hi-index	0.00

Visualization

Abstract

We report the computational advances that have enabled the first micron-scale simulation of a Kelvin-Helmholtz (KH) instability using molecular dynamics (MD). The advances are in three key areas for massively parallel computation such as on BlueGene/L (BG/L): fault tolerance, application kernel optimization, and highly efficient parallel I/O. In particular, we have developed novel capabilities for handling hardware parity errors and improving the speed of interatomic force calculations, while achieving near optimal I/O speeds on BG/L, allowing us to achieve excellent scalability and improve overall application performance. As a result we have successfully conducted a 2-billion atom KH simulation amounting to 2.8 CPU-millennia of run time, including a single, continuous simulation run in excess of 1.5 CPU-millennia. We have also conducted 9-billion and 62.5-billion atom KH simulations. The current optimized ddcMD code is benchmarked at 115.1 TFlop/s in our scaling study and 103.9 TFlop/s in a sustained science run, with additional improvements ongoing. These improvements enabled us to run the first MD simulations of micron-scale systems developing the KH instability.