Improving data locality with loop transformations
ACM Transactions on Programming Languages and Systems (TOPLAS)
Demonstrating the scalability of a molecular dynamics application on a Petaflop computer
ICS '01 Proceedings of the 15th international conference on Supercomputing
Scalable atomistic simulation algorithms for materials research
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
NAMD: biomolecular simulation on thousands of processors
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Vectorization for SIMD architectures with alignment constraints
Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Analysis and Performance Results of a Molecular Modeling Application on Merrimac
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Anton, a special-purpose machine for molecular dynamics simulation
Proceedings of the 34th annual international symposium on Computer architecture
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
A metascalable computing framework for large spatiotemporal-scale atomistic simulations
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
High-order stencil computations on multicore clusters
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
An efficient vectorization of linked-cell particle simulations
Proceedings of the 9th conference on Computing Frontiers
The Journal of Supercomputing
Hi-index | 0.00 |
We have developed a scalable hierarchical parallelization scheme for molecular dynamics (MD) simulation on multicore clusters. The scheme explores multilevel parallelism combining: (1) Internode parallelism using spatial decomposition via message passing; (2) intercore parallelism using cellular decomposition via multithreading employing a master/worker model; (3) data-level optimization via single-instruction multiple-data (SIMD) parallelism with various code transformation techniques. By using a hierarchy of parallelisms, the scheme exposes very high concurrency and data locality, thereby achieving: (1) internode weak-scaling parallel efficiency 0.985 on 106,496 BlueGene/L nodes (0.975 on 32,768 BlueGene/P nodes), internode strong-scaling parallel efficiency 0.90 on 8,192 BlueGene/L nodes; (2) intercore multithread parallel efficiency 0.65 for eight threads on a dual quadcore Xeon platform; and (3) SIMD speedup around 2 for problem sizes ranging from 3,072 to 98,304 atoms. Furthermore, the effect of memory-access penalty on SIMD performance is analyzed, and an application-based SIMD analysis scheme is proposed to help programmers determine whether their applications are amenable to SIMDization.