Adapting a message-driven parallel application to GPU-accelerated clusters

Authors:
James C. Phillips;John E. Stone;Klaus Schulten
Affiliations:
University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL;University of Illinois at Urbana-Champaign, Urbana, IL
Venue:
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
Year:
2008

Citing 19
Cited 25

Active messages: a mechanism for integrated communication and computation

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
NAMD2: greater scalability for parallel molecular dynamics

Journal of Computational Physics - Special issue on computational molecular biophysics
Fast Messages: Efficient, Portable Communication for Workstation Clusters and MPPs

IEEE Parallel & Distributed Technology: Systems & Technology
NAMD: biomolecular simulation on thousands of processors

Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Brook for GPUs: stream computing on graphics hardware

ACM SIGGRAPH 2004 Papers
Shader algebra

ACM SIGGRAPH 2004 Papers
Performance and modularity benefits of message-driven execution

Journal of Parallel and Distributed Computing
GPU Cluster for High Performance Computing

Proceedings of the 2004 ACM/IEEE conference on Supercomputing
ClawHMMER: A Streaming HMMer-Search Implementatio

SC '05 Proceedings of the 2005 ACM/IEEE conference on Supercomputing
Scaling applications to massively parallel machines using Projections performance analysis tool

Future Generation Computer Systems
Design of High Performance MVAPICH2: MPI2 over InfiniBand

CCGRID '06 Proceedings of the Sixth IEEE International Symposium on Cluster Computing and the Grid
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand

ICPP '06 Proceedings of the 2006 International Conference on Parallel Processing
A hardware redundancy and recovery mechanism for reliable scientific computation on graphics processors

Proceedings of the 22nd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Exploring weak scalability for FEM calculations on a GPU-enhanced cluster

Parallel Computing
General purpose molecular dynamics simulations fully implemented on graphics processing units

Journal of Computational Physics
Extending stability beyond CPU millennium: a micron-scale atomistic simulation of Kelvin-Helmholtz instability

Proceedings of the 2007 ACM/IEEE conference on Supercomputing
Scalable Parallel Programming with CUDA

Queue - GPU Computing
GPU acceleration of cutoff pair potentials for molecular modeling applications

Proceedings of the 5th conference on Computing frontiers
Initial experiences porting a bioinformatics application to a graphics processor

PCI'05 Proceedings of the 10th Panhellenic conference on Advances in Informatics

Probing biomolecular machines with graphics processors

Communications of the ACM - A View of Parallel Computing
Probing Biomolecular Machines with Graphics Processors

Queue - Bioscience
A massively parallel adaptive fast-multipole method on heterogeneous architectures

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Fast heterogeneous computing with CUDA compatible Tesla GPU computing processor (personal supercomputing)

Proceedings of the International Conference and Workshop on Emerging Trends in Technology
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster

Journal of Computational Physics
A Memory Centric Kernel Framework for Accelerating Short-Range, Interactive Particle Simulation

CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
The reverse-acceleration model for programming petascale hybrid systems

IBM Journal of Research and Development
Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Exploiting 162-Nanosecond End-to-End Communication Latency on Anton

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Scaling Hierarchical N-body Simulations on GPU Clusters

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Immersive molecular visualization and interactive modeling with commodity hardware

ISVC'10 Proceedings of the 6th international conference on Advances in visual computing - Volume Part II
Achieving a single compute device image in OpenCL for multiple GPUs

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Fast analysis of molecular dynamics trajectories with graphics processing units-Radial distribution function histogramming

Journal of Computational Physics
Scaling scientific applications on clusters of hybrid multicore/GPU nodes

Proceedings of the 8th ACM International Conference on Computing Frontiers
Introducing scalable quantum approaches in language representation

QI'11 Proceedings of the 5th international conference on Quantum interaction
Poster: 3D tixels: a highly efficient algorithm for gpu/cpu-acceleration of molecular dynamics on heterogeneous parallel architectures

Proceedings of the 2011 companion on High Performance Computing Networking, Storage and Analysis Companion
A massively parallel adaptive fast multipole method on heterogeneous architectures

Communications of the ACM
Fine-grain parallelism using multi-core, Cell/BE, and GPU Systems

Parallel Computing
Performance models for asynchronous data transfers on consumer Graphics Processing Units

Journal of Parallel and Distributed Computing
Direct approaches to exploit many-core architecture in bioinformatics

Future Generation Computer Systems
Optimizing fine-grained communication in a biomolecular simulation application on Cray XK6

SC '12 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Accelerating text mining workloads in a MapReduce-based distributed GPU environment

Journal of Parallel and Distributed Computing
Multi-level parallelism for incompressible flow computations on GPU clusters

Parallel Computing
G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systems

Proceedings of the 27th international ACM conference on International conference on supercomputing
A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method

Proceedings of Workshop on General Purpose Processing Using GPUs

Quantified Score

Hi-index	0.02

Visualization

Abstract

Graphics processing units (GPUs) have become an attractive option for accelerating scientific computations as a result of advances in the performance and flexibility of GPU hardware, and due to the availability of GPU software development tools targeting general purpose and scientific computation. However, effective use of GPUs in clusters presents a number of application development and system integration challenges. We describe strategies for the decomposition and scheduling of computation among CPU cores and GPUs, and techniques for overlapping communication and CPU computation with GPU kernel execution. We report the adaptation of these techniques to NAMD, a widely-used parallel molecular dynamics simulation package, and present performance results for a 64-core 64-GPU cluster.