A simple GPU-accelerated two-dimensional MUSCL-Hancock solver for ideal magnetohydrodynamics

Authors:
Christopher M. Bard;John C. Dorelli
Affiliations:
University of Wisconsin, Madison, United States;NASA-GSFC, United States
Venue:
Journal of Computational Physics
Year:
2014

Citing 10
Cited 0

Compiler blockability of numerical algorithms

Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Access normalization: loop restructuring for NUMA computers

ACM Transactions on Computer Systems (TOCS)
On the Choice of Wavespeeds for the HLLC Riemann Solver

SIAM Journal on Scientific Computing
A staggered mesh algorithm using high order Godunov fluxes to ensure solenodial magnetic fields in magnetohydrodynamic simulations

Journal of Computational Physics
Hyperbolic divergence cleaning for the MHD equations

Journal of Computational Physics
Iteration Space Tiling for Memory Hierarchies

Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing
On the divergence-free condition in Godunov-type schemes for ideal magnetohydrodynamics: the upwind constrained transport method

Journal of Computational Physics
A multi-state HLL approximate Riemann solver for ideal magnetohydrodynamics

Journal of Computational Physics
An unsplit staggered mesh scheme for multidimensional magnetohydrodynamics

Journal of Computational Physics
Programming Massively Parallel Processors: A Hands-on Approach

Programming Massively Parallel Processors: A Hands-on Approach

Quantified Score

Hi-index	31.45

Visualization

Abstract

We describe our experience using NVIDIA@?s CUDA (Compute Unified Device Architecture) C programming environment to implement a two-dimensional second-order MUSCL-Hancock ideal magnetohydrodynamics (MHD) solver on a GTX 480 Graphics Processing Unit (GPU). Taking a simple approach in which the MHD variables are stored exclusively in the global memory of the GTX 480 and accessed in a cache-friendly manner (without further optimizing memory access by, for example, staging data in the GPU@?s faster shared memory), we achieved a maximum speed-up of ~126 for a 1024^2 grid relative to the sequential C code running on a single Intel Nehalem (2.8 GHz) core. This speedup is consistent with simple estimates based on the known floating point performance, memory throughput and parallel processing capacity of the GTX 480.