Computer simulation using particles
Computer simulation using particles
International Journal of Parallel Programming
NAMD: biomolecular simulation on thousands of processors
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Zonal methods for the parallel execution of range-limited N-body simulations
Journal of Computational Physics
GPU accelerated molecular dynamics simulation of thermal conductivities
Journal of Computational Physics
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Hybrid MPI/OpenMP Parallel Programming on Clusters of Multi-Core SMP Nodes
PDP '09 Proceedings of the 2009 17th Euromicro International Conference on Parallel, Distributed and Network-based Processing
International Journal of High Performance Computing Applications
A metascalable computing framework for large spatiotemporal-scale atomistic simulations
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
Millisecond-scale molecular dynamics simulations on Anton
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Memory-efficient optimization of Gyrokinetic particle-to-grid interpolation for multicore processors
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Beyond homogeneous decomposition: scaling long-range forces on Massively Parallel Systems
Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
ICPPW '09 Proceedings of the 2009 International Conference on Parallel Processing Workshops
GPU-accelerated molecular dynamics simulation for study of liquid crystalline flows
Journal of Computational Physics
Dynamic multi phase scheduling for heterogeneous cluste
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters
The Journal of Supercomputing
Improving Performance on Atmospheric Models through a Hybrid OpenMP/MPI Implementation
ISPA '11 Proceedings of the 2011 IEEE Ninth International Symposium on Parallel and Distributed Processing with Applications
Algorithm Design
Hi-index | 0.00 |
We propose and analyze threading algorithms for hybrid MPI/OpenMP parallelization of a molecular-dynamics simulation, which are scalable on large multicore clusters. Two data-privatization thread scheduling algorithms via nucleation-growth allocation are introduced: (1) compact-volume allocation scheduling (CVAS); and (2) breadth-first allocation scheduling (BFAS). The algorithms combine fine-grain dynamic load balancing and minimal memory-footprint data privatization threading. We show that the computational costs of CVAS and BFAS are bounded by 驴(n 5/3 p 驴2/3) and 驴(n), respectively, for p threads working on n particles on a multicore compute node. Memory consumption per node of both algorithms scales as O(n+n 2/3 p 1/3), but CVAS has smaller prefactors due to a geometric effect. Based on these analyses, we derive the selection criterion between the two algorithms in terms of the granularity, n/p. We observe that memory consumption is reduced by 75 % for p=16 and n=8,192 compared to a naïve data privatization, while maintaining thread imbalance below 5 %. We obtain a strong-scaling speedup of 14.4 with 16-way threading on a four quad-core AMD Opteron node. In addition, our MPI/OpenMP code achieves 2.58脳 and 2.16脳 speedups over the MPI-only implementation on 32,768 cores of BlueGene/P for 0.84 and 1.68 million particle systems, respectively.