Fast parallel algorithms for short-range molecular dynamics
Journal of Computational Physics
Implicitly parallel programming models for thousand-core microprocessors
Proceedings of the 44th annual Design Automation Conference
General purpose molecular dynamics simulations fully implemented on graphics processing units
Journal of Computational Physics
Cray XT4: an early evaluation for petascale scientific simulation
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Anton, a special-purpose machine for molecular dynamics simulation
Communications of the ACM - Web science
Overview of the IBM Blue Gene/P project
IBM Journal of Research and Development
A Practical Quicksort Algorithm for Graphics Processors
ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Computing Models for FPGA-Based Accelerators
Computing in Science and Engineering
Fast Conjugate Gradients with Multiple GPUs
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
An adaptive performance modeling tool for GPU architectures
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Model-driven autotuning of sparse matrix-vector multiply on GPUs
Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Multi GPU implementation of iterative tomographic reconstruction algorithms
ISBI'09 Proceedings of the Sixth IEEE international conference on Symposium on Biomedical Imaging: From Nano to Macro
Breaking the petaflops barrier
IBM Journal of Research and Development
Performance and cost effectiveness of a cluster of workstations and MD-GRAPE 2 for MD simulations
ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing
CPU/GPU computing for long-wave radiation physics on large GPU clusters
Computers & Geosciences
Hi-index | 0.00 |
Biomolecular simulations have traditionally benefited from increases in the processor clock speed and coarse-grain inter-node parallelism on large-scale clusters. With stagnating clock frequencies, the evolutionary path for performance of microprocessors is maintained by virtue of core multiplication. Graphical processing units (GPUs) offer revolutionary performance potential at the cost of increased programming complexity. Furthermore, it has been extremely challenging to effectively utilize heterogeneous resources (host processor and GPU cores) for scientific simulations, as underlying systems, programming models and tools are continually evolving. In this paper, we present a parametric study demonstrating approaches to exploit resources of heterogeneous systems to reduce time-to-solution of a production-level application for biological simulations. By overlapping and pipelining computation and communication, we observe up to 10-fold application acceleration in multi-core and multi-GPU environments illustrating significant performance improvements over code acceleration approaches, where the host-to-accelerator ratio is static, and is constrained by a given algorithmic implementation.