Exploiting hierarchical parallelisms for molecular dynamics simulation on multicore clusters

Authors:
Liu Peng;Manaschai Kunaseth;Hikmet Dursun;Ken-Ichi Nomura;Weiqiang Wang;Rajiv K. Kalia;Aiichiro Nakano;Priya Vashishta
Affiliations:
Collaboratory for Advanced Computing and Simulations (CACS), University of Southern California, Los Angeles, USA 90089-0242;Collaboratory for Advanced Computing and Simulations (CACS), University of Southern California, Los Angeles, USA 90089-0242;Collaboratory for Advanced Computing and Simulations (CACS), University of Southern California, Los Angeles, USA 90089-0242;Collaboratory for Advanced Computing and Simulations (CACS), University of Southern California, Los Angeles, USA 90089-0242;Collaboratory for Advanced Computing and Simulations (CACS), University of Southern California, Los Angeles, USA 90089-0242;Collaboratory for Advanced Computing and Simulations (CACS), University of Southern California, Los Angeles, USA 90089-0242;Collaboratory for Advanced Computing and Simulations (CACS), University of Southern California, Los Angeles, USA 90089-0242;Collaboratory for Advanced Computing and Simulations (CACS), University of Southern California, Los Angeles, USA 90089-0242
Venue:
The Journal of Supercomputing
Year:
2011

Citing 14
Cited 2

Improving data locality with loop transformations

ACM Transactions on Programming Languages and Systems (TOPLAS)
Demonstrating the scalability of a molecular dynamics application on a Petaflop computer

ICS '01 Proceedings of the 15th international conference on Supercomputing
Scalable atomistic simulation algorithms for materials research

Proceedings of the 2001 ACM/IEEE conference on Supercomputing
NAMD: biomolecular simulation on thousands of processors

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Vectorization for SIMD architectures with alignment constraints

Proceedings of the ACM SIGPLAN 2004 conference on Programming language design and implementation
Analysis and Performance Results of a Molecular Modeling Application on Merrimac

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Merrimac: Supercomputing with Streams

Proceedings of the 2003 ACM/IEEE conference on Supercomputing
Preliminary investigation of advanced electrostatics in molecular dynamics on reconfigurable computers

Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Anton, a special-purpose machine for molecular dynamics simulation

Proceedings of the 34th annual international symposium on Computer architecture
Executing irregular scientific applications on stream architectures

Proceedings of the 21st annual international conference on Supercomputing
A 281 Tflops calculation for X-ray protein structure analysis with special-purpose computers MDGRAPE-3

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Efficient vectorization of SIMD programs with non-aligned and irregular data access hardware

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
A metascalable computing framework for large spatiotemporal-scale atomistic simulations

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
High-order stencil computations on multicore clusters

IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing

An efficient vectorization of linked-cell particle simulations

Proceedings of the 9th conference on Computing Frontiers
Analysis of scalable data-privatization threading algorithms for hybrid MPI/OpenMP parallelization of molecular dynamics

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We have developed a scalable hierarchical parallelization scheme for molecular dynamics (MD) simulation on multicore clusters. The scheme explores multilevel parallelism combining: (1) Internode parallelism using spatial decomposition via message passing; (2) intercore parallelism using cellular decomposition via multithreading employing a master/worker model; (3) data-level optimization via single-instruction multiple-data (SIMD) parallelism with various code transformation techniques. By using a hierarchy of parallelisms, the scheme exposes very high concurrency and data locality, thereby achieving: (1) internode weak-scaling parallel efficiency 0.985 on 106,496 BlueGene/L nodes (0.975 on 32,768 BlueGene/P nodes), internode strong-scaling parallel efficiency 0.90 on 8,192 BlueGene/L nodes; (2) intercore multithread parallel efficiency 0.65 for eight threads on a dual quadcore Xeon platform; and (3) SIMD speedup around 2 for problem sizes ranging from 3,072 to 98,304 atoms. Furthermore, the effect of memory-access penalty on SIMD performance is analyzed, and an application-based SIMD analysis scheme is proposed to help programmers determine whether their applications are amenable to SIMDization.