Fine-grained parallelization of the Car-Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer

Authors:
E. Bohm;A. Bhatele;L. V. Kalé;M. E. Tuckerman;S. Kumar;J. A. Gunnels;G. J. Martyna
Affiliations:
Department of Computer Science, Thomas M. Siebel Center University of Illinois at Urbana-Champaign, Urbana, Illinois;Department of Computer Science, Thomas M. Siebel Center University of Illinois at Urbana-Champaign, Urbana, Illinois;Department of Computer Science, Thomas M. Siebel Center University of Illinois at Urbana-Champaign, Urbana, Illinois;Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, New York;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, New York;IBM Research Division, IBM T. J. Watson Research Center, Yorktown Heights, New York;Physical Sciences Division, IBM T. J. Watson Research Center, Yorktown Heights, New York
Venue:
IBM Journal of Research and Development
Year:
2008

Citing 10
Cited 8

NAMD: biomolecular simulation on thousands of processors

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Blue Gene: a vision for protein science using a petaflop supercomputer

IBM Systems Journal - Deep computing for the life sciences
Design and implementation of a one-sided communication interface for the IBM eServer Blue Gene® supercomputer

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
The development and integration of a distributed 3D FFT for a cluster of workstations

ALS'00 Proceedings of the 4th annual Linux Showcase & Conference - Volume 4
Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system

IBM Journal of Research and Development
Overview of the Blue Gene/L system architecture

IBM Journal of Research and Development
Blue Gene/L torus interconnection network

IBM Journal of Research and Development
Design and exploitation of a high-performance SIMD floating-point unit for Blue Gene/L

IBM Journal of Research and Development
Design and implementation of message-passing services for the Blue Gene/L supercomputer

IBM Journal of Research and Development
Achieving strong scaling with NAMD on blue Gene/L

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing

Memory tagging in Charm++

PADTAD '08 Proceedings of the 6th workshop on Parallel and distributed systems: testing, analysis, and debugging
A Case Study of Communication Optimizations on 3D Mesh Interconnects

Euro-Par '09 Proceedings of the 15th International Euro-Par Conference on Parallel Processing
Towards a framework for abstracting accelerators in parallel applications: experience with cell

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Optimizing a parallel runtime system for multicore clusters: a case study

Proceedings of the 2010 TeraGrid Conference
Overlapping Methods of All-to-All Communication and FFT Algorithms for Torus-Connected Massively Parallel Supercomputers

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Programming heterogeneous clusters with accelerators using object-based programming

Scientific Programming
Enabling and scaling biomolecular simulations of 100 million atoms on petascale machines with a multicore-optimized message-driven runtime

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Scalable algorithms for constructing balanced spanning trees on system-ranked process groups

EuroMPI'12 Proceedings of the 19th European conference on Recent Advances in the Message Passing Interface

Quantified Score

Hi-index	0.00

Visualization

Abstract

Important scientific problems can be treated via ab initio-based molecular modeling approaches, wherein atomic forces are derived from an energy Junction that explicitly considers the electrons. The Car-Parrinello ab initio molecular dynamics (CPAIMD) method is widely used to study small systems containing on the order of 10 to 103 atoms. However, the impact of CPAIMD has been limited until recently because of difficulties inherent to scaling the technique beyond processor numbers about equal to the number of electronic states. CPAIMD computations involve a large number of interdependent phases with high interprocessor communication overhead. These phases require the evaluation of various transforms and non-square matrix multiplications that require large interprocessor data movement when efficiently parallelized. Using the Charm++ parallel programming language and runtime system, the phases are discretized into a large number of virtual processors, which are, in turn, mapped flexibly onto physical processors, thereby allowing interleaving of work. Algorithmic and IBM Blue Gene/L™ system-specific optimizations are employed to scale the CPAIMD method to at least 30 times the number of electronic states in small systems consisting of 24 to 768 atoms (32 to 1,024 electronic states) in order to demonstrate fine-grained parallelism. The largest systems studied scaled well across the entire machine (20,480 nodes).