Performance analysis and optimization of molecular dynamics simulation on Godson-T many-core processor

  • Authors:
  • Liu Peng;Aiichiro Nakano;Guangming Tan;Priya Vashishta;Dongrui Fan;Hao Zhang;Rajiv K. Kalia;Fenglong Song

  • Affiliations:
  • Chinese Academy of Sciences, Beijing, China and University of Southern California, Los Angeles, CA;University of Southern California, Los Angeles, CA;Chinese Academy of Sciences, Beijing, China;University of Southern California, Los Angeles, CA;Chinese Academy of Sciences, Beijing, China;Chinese Academy of Sciences, Beijing, China;University of Southern California, Los Angeles, CA;Chinese Academy of Sciences, Beijing, China

  • Venue:
  • Proceedings of the 8th ACM International Conference on Computing Frontiers
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Molecular dynamics (MD) simulation has broad applications, but its irregular memory-access pattern makes performance optimization a challenge. This paper presents a joint application/architecture study to enhance on-chip parallelism of MD on Godson-T -like many-core architecture. First, a preprocessing leveraging an adaptive divide-and-conquer framework is designed to exploit locality through memory hierarchy with software controlled memory. Then we propose three incremental optimization strategies: (1) a novel data-layout to re-organize linked-list cell data structures to improve data locality; (2) an on-chip locality-aware parallel algorithm to enhance data reuse; and (3) a pipelining algorithm to hide latency to shared memory. Experiments on Godson-T simulator exhibit strong-scaling parallel efficiency 0.99 on 64 cores, which is confirmed by an FPGA emulator. Detailed analysis shows that optimizations utilizing architectural features to maximize data locality and to enhance data reuse benefit scalability most. Furthermore, a simple performance model suggests that the optimization scheme is likely to scale well toward exascale. Certain architectural features are found essential for these optimizations, which could guide future hardware developments.