Total-variation-diminishing time discretizations
SIAM Journal on Scientific and Statistical Computing
A ghost cell expansion method for reducing communications in solving PDE problems
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
3D finite difference computation on GPUs using CUDA
Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units
Nodal discontinuous Galerkin methods on graphics processors
Journal of Computational Physics
A GPU-Based Simulation of Tsunami Propagation and Inundation
ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Simulation of shallow-water systems using graphics processing units
Mathematics and Computers in Simulation
State-of-the-art in heterogeneous computing
Scientific Programming
Asynchronous Communication Schemes for Finite Difference Methods on Multiple GPUs
CCGRID '10 Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing
Programming CUDA-based GPUs to simulate two-layer shallow water flows
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Simulation and visualization of the Saint-Venant system using GPUs
Computing and Visualization in Science
Solving the euler equations on graphics processing units
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
An MPI-CUDA implementation of an improved Roe method for two-layer shallow water systems
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
We present a state-of-the-art shallow water simulator running on multiple GPUs. Our implementation is based on an explicit high-resolution finite volume scheme suitable for modeling dam breaks and flooding. We use row domain decomposition to enable multi-GPU computations, and perform traditional CUDA block decomposition within each GPU for further parallelism. Our implementation shows near perfect weak and strong scaling, and enables simulation of domains consisting of up-to 235 million cells at a rate of over 1.2 gigacells per second using four Fermi-generation GPUs. The code is thoroughly benchmarked using three different systems, both high-performance and commodity-level systems.