The Metropolis Monte Carlo method with CUDA enabled Graphic Processing Units

  • Authors:
  • Clifford Hall;Weixiao Ji;Estela Blaisten-Barojas

  • Affiliations:
  • Computational Materials Science Center, George Mason University, 4400 University Dr., Fairfax, VA 22030, USA and School of Physics, Astronomy, & Computational Sciences, George Mason University, 44 ...;Computational Materials Science Center, George Mason University, 4400 University Dr., Fairfax, VA 22030, USA;Computational Materials Science Center, George Mason University, 4400 University Dr., Fairfax, VA 22030, USA and School of Physics, Astronomy, & Computational Sciences, George Mason University, 44 ...

  • Venue:
  • Journal of Computational Physics
  • Year:
  • 2014

Quantified Score

Hi-index 31.45

Visualization

Abstract

We present a CPU-GPU system for runtime acceleration of large molecular simulations using GPU computation and memory swaps. The memory architecture of the GPU can be used both as container for simulation data stored on the graphics card and as floating-point code target, providing an effective means for the manipulation of atomistic or molecular data on the GPU. To fully take advantage of this mechanism, efficient GPU realizations of algorithms used to perform atomistic and molecular simulations are essential. Our system implements a versatile molecular engine, including inter-molecule interactions and orientational variables for performing the Metropolis Monte Carlo (MMC) algorithm, which is one type of Markov chain Monte Carlo. By combining memory objects with floating-point code fragments we have implemented an MMC parallel engine that entirely avoids the communication time of molecular data at runtime. Our runtime acceleration system is a forerunner of a new class of CPU-GPU algorithms exploiting memory concepts combined with threading for avoiding bus bandwidth and communication. The testbed molecular system used here is a condensed phase system of oligopyrrole chains. A benchmark shows a size scaling speedup of 60 for systems with 210,000 pyrrole monomers. Our implementation can easily be combined with MPI to connect in parallel several CPU-GPU duets.