Computer simulation of liquids
Computer simulation of liquids
Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
Demonstrating the scalability of a molecular dynamics application on a Petaflop computer
ICS '01 Proceedings of the 15th international conference on Supercomputing
Scalable atomistic simulation algorithms for materials research
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
NAMD: biomolecular simulation on thousands of processors
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
FOCS '99 Proceedings of the 40th Annual Symposium on Foundations of Computer Science
Analysis and Performance Results of a Molecular Modeling Application on Merrimac
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Merrimac: Supercomputing with Streams
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
The cache complexity of multithreaded cache oblivious algorithms
Proceedings of the eighteenth annual ACM symposium on Parallelism in algorithms and architectures
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
Anton, a special-purpose machine for molecular dynamics simulation
Proceedings of the 34th annual international symposium on Computer architecture
Executing irregular scientific applications on stream architectures
Proceedings of the 21st annual international conference on Supercomputing
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Dynamic topology aware load balancing algorithms for molecular dynamics applications
Proceedings of the 23rd international conference on Supercomputing
Study on Fine-Grained Synchronization in Many-Core Architecture
SNPD '09 Proceedings of the 2009 10th ACIS International Conference on Software Engineering, Artificial Intelligences, Networking and Parallel/Distributed Computing
A Low-Complexity Synchronization Based Cache Coherence Solution for Many Cores
CIT '09 Proceedings of the 2009 Ninth IEEE International Conference on Computer and Information Technology - Volume 02
Hi-index | 0.00 |
Molecular dynamics (MD) simulation has broad applications, and an increasing amount of computing power is needed to satisfy the large scale of the real world simulation. The advent of the many-core paradigm brings unprecedented computing power, but it remains a great challenge to harvest the computing power due to MD's irregular memory-access pattern. To address this challenge, this paper presents a joint application/architecture study to enhance the scalability of MD on Godson-T-like many-core architecture. First, a preprocessing approach leveraging an adaptive divide-and-conquer framework is designed to exploit locality through memory hierarchy with software controlled memory. Then three incremental optimization strategies-a novel data-layout to improve data locality, an on-chip locality-aware parallel algorithm to enhance data reuse, and a pipelining algorithm to hide latency to shared memory-are proposed to enhance on-chip parallelism for Godson-T many-core processor. Experiments on Godson-T simulator exhibit strong-scaling parallel efficiency of 0.99 on 64 cores, which is confirmed by a field-programmable gate array emulator. Also the performance per watt of MD on Godson-T is much higher than MD on a 16-cores Intel core i7 symmetric multiprocessor (SMP) and 26 times higher than MD on an 8-core 64-thread Sun T2 processor. Detailed analysis shows that optimizations utilizing architectural features to maximize data locality and to enhance data reuse benefit scalability most. Furthermore, a hierarchical parallelization scheme is designed to map the MD algorithm to Godson-T many-core cluster and a simple performance model is derived, which suggests that the optimization scheme is likely to scale well toward exascale. Certain architectural features are found essential for these optimizations, which could guide future hardware developments.