Poster: 3D tixels: a highly efficient algorithm for gpu/cpu-acceleration of molecular dynamics on heterogeneous parallel architectures

  • Authors:
  • Szilárd Páll;Berk Hess;Erik Lindahl

  • Affiliations:
  • Royal Institute of Technology, Stockholm, Sweden;Royal Institute of Technology, Stockholm, Sweden;Royal Institute of Technology, Stockholm, Sweden

  • Venue:
  • Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Several GPU-based algorithms have been developed to accelerate biomolecular simulations, but although they provide benefits over single-core implementations, they have not been able to surpass the performance of state-of-the art SIMD CPU implementations (e.g. GROMACS), not to mention efficient scaling. Here, we present a heterogenous parallelization that utilizes both CPU and GPU resources efficiently. A novel fixed-particle-number sub-cell algorithm for non-bonded force calculation was developed. The algorithm uses the SIMD width as algorithmic work unit, it is intrinsically future-proof since it can be adapted to future hardware. The CUDA non-bonded kernel implementation achieves up to 60\% work-efficiency, 1.5 IPC, and 95\% L1 cache utilization. On the CPU OpenMP-parallelized SSE-accelerated code runs overlapping with GPU execution. Fully automated dynamic inter-process as well as CPU-GPU load balancing is employed. We achieve threefold speedup compared to equivalent GROMACS CPU code and show good strong and weak scaling. To the best of our knowledge this the fastest GPU molecular dynamics implementation presented to date.