Proceedings of the 1989 ACM/IEEE conference on Supercomputing
Integration, the VLSI Journal
Tile size selection using cache organization and data layout
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Optimizing compilers for modern architectures: a dependence-based approach
Optimizing compilers for modern architectures: a dependence-based approach
Scatter-Add in Data Parallel Architectures
HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Proceedings of the 2006 ACM/IEEE conference on Supercomputing
General purpose molecular dynamics simulations fully implemented on graphics processing units
Journal of Computational Physics
GPU acceleration of cutoff pair potentials for molecular modeling applications
Proceedings of the 5th conference on Computing frontiers
Atomic Vector Operations on Chip Multiprocessors
ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Entering the petaflop era: the architecture and performance of Roadrunner
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Adapting a message-driven parallel application to GPU-accelerated clusters
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Parallel Lattice Boltzmann Flow Simulation on Emerging Multi-core Platforms
Euro-Par '08 Proceedings of the 14th international Euro-Par conference on Parallel Processing
Graphical Processing Units for Quantum Chemistry
Computing in Science and Engineering
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
GPU acceleration of a production molecular docking code
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Accelerating Correlated Quantum Chemistry Calculations Using Graphical Processing Units
Computing in Science and Engineering
Scaling fast multipole methods up to 4000 GPUs
Proceedings of the ATIP/A*CRC Workshop on Accelerator Technologies for High-Performance Computing: Does Asia Lead the Way?
GPU-accelerated molecular visualization on petascale supercomputing platforms
UltraVis '13 Proceedings of the 8th International Workshop on Ultrascale Visualization
Hi-index | 31.45 |
The calculation of radial distribution functions (RDFs) from molecular dynamics trajectory data is a common and computationally expensive analysis task. The rate limiting step in the calculation of the RDF is building a histogram of the distance between atom pairs in each trajectory frame. Here we present an implementation of this histogramming scheme for multiple graphics processing units (GPUs). The algorithm features a tiling scheme to maximize the reuse of data at the fastest levels of the GPU's memory hierarchy and dynamic load balancing to allow high performance on heterogeneous configurations of GPUs. Several versions of the RDF algorithm are presented, utilizing the specific hardware features found on different generations of GPUs. We take advantage of larger shared memory and atomic memory operations available on state-of-the-art GPUs to accelerate the code significantly. The use of atomic memory operations allows the fast, limited-capacity on-chip memory to be used much more efficiently, resulting in a fivefold increase in performance compared to the version of the algorithm without atomic operations. The ultimate version of the algorithm running in parallel on four NVIDIA GeForce GTX 480 (Fermi) GPUs was found to be 92 times faster than a multithreaded implementation running on an Intel Xeon 5550 CPU. On this multi-GPU hardware, the RDF between two selections of 1,000,000 atoms each can be calculated in 26.9s per frame. The multi-GPU RDF algorithms described here are implemented in VMD, a widely used and freely available software package for molecular dynamics visualization and analysis.