Optimal Utilization of Heterogeneous Resources for Biomolecular Simulations

Authors:
Scott S. Hampton;Sadaf R. Alam;Paul S. Crozier;Pratul K. Agarwal
Affiliations:
-;-;-;-
Venue:
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Year:
2010

Citing 18
Cited 1

Fast parallel algorithms for short-range molecular dynamics

Journal of Computational Physics
Using FPGA Devices to Accelerate Biomolecular Simulations

Computer
Implicitly parallel programming models for thousand-core microprocessors

Proceedings of the 44th annual Design Automation Conference
General purpose molecular dynamics simulations fully implemented on graphics processing units

Journal of Computational Physics
Cray XT4: an early evaluation for petascale scientific simulation

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Anton, a special-purpose machine for molecular dynamics simulation

Communications of the ACM - Web science
NVIDIA Tesla: A Unified Graphics and Computing Architecture

IEEE Micro
Overview of the IBM Blue Gene/P project

IBM Journal of Research and Development
A Practical Quicksort Algorithm for Graphics Processors

ESA '08 Proceedings of the 16th annual European symposium on Algorithms
Computing Models for FPGA-Based Accelerators

Computing in Science and Engineering
Multilevel summation of electrostatic potentials using graphics processing units

Parallel Computing
Fast Conjugate Gradients with Multiple GPUs

ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
An adaptive performance modeling tool for GPU architectures

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Model-driven autotuning of sparse matrix-vector multiply on GPUs

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Multi GPU implementation of iterative tomographic reconstruction algorithms

ISBI'09 Proceedings of the Sixth IEEE international conference on Symposium on Biomedical Imaging: From Nano to Macro
Breaking the petaflops barrier

IBM Journal of Research and Development
Performance and cost effectiveness of a cluster of workstations and MD-GRAPE 2 for MD simulations

ISPDC'03 Proceedings of the Second international conference on Parallel and distributed computing

CPU/GPU computing for long-wave radiation physics on large GPU clusters

Computers & Geosciences

Quantified Score

Hi-index	0.00

Visualization

Abstract

Biomolecular simulations have traditionally benefited from increases in the processor clock speed and coarse-grain inter-node parallelism on large-scale clusters. With stagnating clock frequencies, the evolutionary path for performance of microprocessors is maintained by virtue of core multiplication. Graphical processing units (GPUs) offer revolutionary performance potential at the cost of increased programming complexity. Furthermore, it has been extremely challenging to effectively utilize heterogeneous resources (host processor and GPU cores) for scientific simulations, as underlying systems, programming models and tools are continually evolving. In this paper, we present a parametric study demonstrating approaches to exploit resources of heterogeneous systems to reduce time-to-solution of a production-level application for biological simulations. By overlapping and pipelining computation and communication, we observe up to 10-fold application acceleration in multi-core and multi-GPU environments illustrating significant performance improvements over code acceleration approaches, where the host-to-accelerator ratio is static, and is constrained by a given algorithmic implementation.