Exploiting hierarchy parallelism for molecular dynamics on a petascale heterogeneous system

Authors:
Qiang Wu;Canqun Yang;Tao Tang;Liquan Xiao
Affiliations:
-;-;-;-
Venue:
Journal of Parallel and Distributed Computing
Year:
2013

Citing 20
Cited 0

Fast parallel algorithms for short-range molecular dynamics

Journal of Computational Physics
Using MPI (2nd ed.): portable parallel programming with the message-passing interface

Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Parallel programming in OpenMP

Parallel programming in OpenMP
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve

IEEE Transactions on Knowledge and Data Engineering
Compilation, Architectural Support, and Evaluation of SIMD Graphics Pipeline Programs on a General-Purpose CPU

Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Scalable algorithms for molecular dynamics simulations on commodity clusters

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
General purpose molecular dynamics simulations fully implemented on graphics processing units

Journal of Computational Physics
GPU acceleration of cutoff pair potentials for molecular modeling applications

Proceedings of the 5th conference on Computing frontiers
Using GPUs to improve multigrid solver performance on a cluster

International Journal of Computational Science and Engineering
Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors

BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Numerical Simulation in Molecular Dynamics: Numerics, Algorithms, Parallelization, Applications

Numerical Simulation in Molecular Dynamics: Numerics, Algorithms, Parallelization, Applications
A metascalable computing framework for large spatiotemporal-scale atomistic simulations

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Molecular dynamics simulations on commodity GPUs with CUDA

HiPC'07 Proceedings of the 14th international conference on High performance computing
Evaluation of OpenMP task scheduling strategies

IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing

CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
The Art of Molecular Dynamics Simulation

The Art of Molecular Dynamics Simulation
Optimizing linpack benchmark on GPU-accelerated petascale supercomputer

Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
A Fast Parallel Implementation of Molecular Dynamics with the Morse Potential on a Heterogeneous Petascale Supercomputer

IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Tianhe-1A Interconnect and Message-Passing Services

IEEE Micro

Quantified Score

Hi-index	0.00

Visualization

Abstract

Heterogeneous systems with nodes containing more than one type of computation units, e.g., central processing units (CPUs) and graphics processing units (GPUs), are becoming popular because of their low cost and high performance. In this paper, we have developed a Three-Level Parallelization Scheme (TLPS) for molecular dynamics (MD) simulation on heterogeneous systems. The scheme exploits multi-level parallelism combining (1) inter-node parallelism using spatial decomposition via message passing, (2) intra-node parallelism using spatial decomposition via dynamically scheduled multi-threading, and (3) intra-chip parallelism using multi-threading and short vector extension in CPUs, and employing multiple CUDA threads in GPUs. By using a hierarchy of parallelism with optimizations such as communication hiding intra-node, and memory optimizations in both CPUs and GPUs, we have implemented and evaluated a MD simulation on a petascale heterogeneous supercomputer TH-1A. The results show that MD simulations can be efficiently parallelized with our TLPS scheme and can benefit from the optimizations.