Triangle: Engineering a 2D Quality Mesh Generator and Delaunay Triangulator
FCRC '96/WACG '96 Selected papers from the Workshop on Applied Computational Geormetry, Towards Geometric Engineering
GPU Cluster for High Performance Computing
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
High Order Extensions of Roe Schemes for Two-Dimensional Nonconservative Hyperbolic Systems
Journal of Scientific Computing
A GPU-Based Simulation of Tsunami Propagation and Inundation
ICA3PP '09 Proceedings of the 9th International Conference on Algorithms and Architectures for Parallel Processing
Simulation of shallow-water systems using graphics processing units
Mathematics and Computers in Simulation
High-order finite-element seismic wave propagation modeling with MPI on a large GPU cluster
Journal of Computational Physics
Programming CUDA-based GPUs to simulate two-layer shallow water flows
Euro-Par'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part II
Data-intensive document clustering on graphics processing unit (GPU) clusters
Journal of Parallel and Distributed Computing
Simulation and visualization of the Saint-Venant system using GPUs
Computing and Visualization in Science
Journal of Scientific Computing
The Journal of Supercomputing
Shallow water simulations on multiple GPUs
PARA'10 Proceedings of the 10th international conference on Applied Parallel and Scientific Computing - Volume 2
Editorial: Special issue editorial: Accelerators for high-performance computing
Journal of Parallel and Distributed Computing
Hi-index | 0.00 |
The numerical solution of two-layer shallow water systems is required to simulate accurately stratified fluids, which are ubiquitous in nature: they appear in atmospheric flows, ocean currents, oil spills, etc. Moreover, the implementation of the numerical schemes to solve these models in realistic scenarios imposes huge demands of computing power. In this paper, we tackle the acceleration of these simulations in triangular meshes by exploiting the combined power of several CUDA-enabled GPUs in a GPU cluster. For that purpose, an improvement of a path conservative Roe-type finite volume scheme which is specially suitable for GPU implementation is presented, and a distributed implementation of this scheme which uses CUDA and MPI to exploit the potential of a GPU cluster is developed. This implementation overlaps MPI communication with CPU-GPU memory transfers and GPU computation to increase efficiency. Several numerical experiments, performed on a cluster of modern CUDA-enabled GPUs, show the efficiency of the distributed solver.