Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Using MPI (2nd ed.): portable parallel programming with the message-passing interface
Parallel programming in OpenMP
Parallel programming in OpenMP
Analysis of the Clustering Properties of the Hilbert Space-Filling Curve
IEEE Transactions on Knowledge and Data Engineering
Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques
Scalable algorithms for molecular dynamics simulations on commodity clusters
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
General purpose molecular dynamics simulations fully implemented on graphics processing units
Journal of Computational Physics
GPU acceleration of cutoff pair potentials for molecular modeling applications
Proceedings of the 5th conference on Computing frontiers
Using GPUs to improve multigrid solver performance on a cluster
International Journal of Computational Science and Engineering
Towards Large-Scale Molecular Dynamics Simulations on Graphics Processors
BICoB '09 Proceedings of the 1st International Conference on Bioinformatics and Computational Biology
Numerical Simulation in Molecular Dynamics: Numerics, Algorithms, Parallelization, Applications
Numerical Simulation in Molecular Dynamics: Numerics, Algorithms, Parallelization, Applications
A metascalable computing framework for large spatiotemporal-scale atomistic simulations
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Molecular dynamics simulations on commodity GPUs with CUDA
HiPC'07 Proceedings of the 14th international conference on High performance computing
Evaluation of OpenMP task scheduling strategies
IWOMP'08 Proceedings of the 4th international conference on OpenMP in a new era of parallelism
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU
Proceedings of the 37th annual international symposium on Computer architecture
Adaptive Optimization for Petascale Heterogeneous CPU/GPU Computing
CLUSTER '10 Proceedings of the 2010 IEEE International Conference on Cluster Computing
The Art of Molecular Dynamics Simulation
The Art of Molecular Dynamics Simulation
Optimizing linpack benchmark on GPU-accelerated petascale supercomputer
Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
Hi-index | 0.00 |
Heterogeneous systems with nodes containing more than one type of computation units, e.g., central processing units (CPUs) and graphics processing units (GPUs), are becoming popular because of their low cost and high performance. In this paper, we have developed a Three-Level Parallelization Scheme (TLPS) for molecular dynamics (MD) simulation on heterogeneous systems. The scheme exploits multi-level parallelism combining (1) inter-node parallelism using spatial decomposition via message passing, (2) intra-node parallelism using spatial decomposition via dynamically scheduled multi-threading, and (3) intra-chip parallelism using multi-threading and short vector extension in CPUs, and employing multiple CUDA threads in GPUs. By using a hierarchy of parallelism with optimizations such as communication hiding intra-node, and memory optimizations in both CPUs and GPUs, we have implemented and evaluated a MD simulation on a petascale heterogeneous supercomputer TH-1A. The results show that MD simulations can be efficiently parallelized with our TLPS scheme and can benefit from the optimizations.