C++ Templates
Low-storage, Explicit Runge-Kutta Schemes for the Compressible Navier-Stokes Equations
Low-storage, Explicit Runge-Kutta Schemes for the Compressible Navier-Stokes Equations
Large calculation of the flow over a hypersonic vehicle using a GPU
Journal of Computational Physics
Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
SkePU: a multi-backend skeleton programming library for multi-GPU systems
Proceedings of the fourth international workshop on High-level parallel programming and applications
Solving the euler equations on graphics processing units
ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part IV
Automatic Offloading C++ Expression Templates to CUDA Enabled GPUs
IPDPSW '12 Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
On the integral constraint of the pressure Poisson equation for incompressible flows
International Journal of Computational Fluid Dynamics
Hi-index | 0.00 |
The application of graphics processing units (GPU) to solve partial differential equations is gaining popularity with the advent of improved computer hardware. Various lower level interfaces exist that allow the user to access GPU specific functions. One such interface is NVIDIA's Compute Unified Device Architecture (CUDA) library. However, porting existing codes to run on the GPU requires the user to write kernels that execute on multiple cores, in the form of Single Instruction Multiple Data (SIMD). In the present work, a higher level framework, termed CU++, has been developed that uses object oriented programming techniques available in C++ such as polymorphism, operator overloading, and template meta programming. Using this approach, CUDA kernels can be generated automatically during compile time. Briefly, CU++ allows a code developer with just C/C++ knowledge to write computer programs that will execute on the GPU without any knowledge of specific programming techniques in CUDA. This approach is tremendously beneficial for Computational Fluid Dynamics (CFD) code development because it mitigates the necessity of creating hundreds of GPU kernels for various purposes. In its current form, CU++ provides a framework for parallel array arithmetic, simplified data structures to interface with the GPU, and smart array indexing. An implementation of heterogeneous parallelism, i.e., utilizing multiple GPUs to simultaneously process a partitioned grid system with communication at the interfaces using Message Passing Interface (MPI) has been developed and tested.