Fine-grained parallelization of the Car-Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer

  • Authors:
  • E. Bohm;A. Bhatele;L. V. Kalé;M. E. Tuckerman;S. Kumar;J. A. Gunnels;G. J. Martyna

  • Affiliations:
  • Department of Computer Science, Thomas M. Siebel Center University of Illinois at Urbana-Champaign, Urbana, Illinois;Department of Computer Science, Thomas M. Siebel Center University of Illinois at Urbana-Champaign, Urbana, Illinois;Department of Computer Science, Thomas M. Siebel Center University of Illinois at Urbana-Champaign, Urbana, Illinois;Department of Chemistry and Courant Institute of Mathematical Sciences, New York University, New York;IBM Research Division, Thomas J. Watson Research Center, Yorktown Heights, New York;IBM Research Division, IBM T. J. Watson Research Center, Yorktown Heights, New York;Physical Sciences Division, IBM T. J. Watson Research Center, Yorktown Heights, New York

  • Venue:
  • IBM Journal of Research and Development
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Important scientific problems can be treated via ab initio-based molecular modeling approaches, wherein atomic forces are derived from an energy Junction that explicitly considers the electrons. The Car-Parrinello ab initio molecular dynamics (CPAIMD) method is widely used to study small systems containing on the order of 10 to 103 atoms. However, the impact of CPAIMD has been limited until recently because of difficulties inherent to scaling the technique beyond processor numbers about equal to the number of electronic states. CPAIMD computations involve a large number of interdependent phases with high interprocessor communication overhead. These phases require the evaluation of various transforms and non-square matrix multiplications that require large interprocessor data movement when efficiently parallelized. Using the Charm++ parallel programming language and runtime system, the phases are discretized into a large number of virtual processors, which are, in turn, mapped flexibly onto physical processors, thereby allowing interleaving of work. Algorithmic and IBM Blue Gene/L™ system-specific optimizations are employed to scale the CPAIMD method to at least 30 times the number of electronic states in small systems consisting of 24 to 768 atoms (32 to 1,024 electronic states) in order to demonstrate fine-grained parallelism. The largest systems studied scaled well across the entire machine (20,480 nodes).