Journal of Computational Physics
Optimization principles and application performance evaluation of a multithreaded GPU using CUDA
Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Benchmarking GPUs to tune dense linear algebra
Proceedings of the 2008 ACM/IEEE conference on Supercomputing
TeraFLOP computing on a desktop PC with GPUs for 3D CFD
International Journal of Computational Fluid Dynamics - Mesoscopic Methods And Their Applications To CFD
Exploring New Architectures in Accelerating CFD for Air Force Applications
HPCMP-UGC '08 Proceedings of the 2008 DoD HPCMP Users Group Conference
Power Consumption of GPUs from a Software Perspective
ICCS '09 Proceedings of the 9th International Conference on Computational Science: Part I
LBM based flow simulation using GPU computing processor
Computers & Mathematics with Applications
A new approach to the lattice Boltzmann method for graphics processing units
Computers & Mathematics with Applications
The TheLMA project: Multi-GPU implementation of the lattice Boltzmann method
International Journal of High Performance Computing Applications
Efficient GPU implementation of the linearly interpolated bounce-back boundary condition
Computers & Mathematics with Applications
Hi-index | 0.01 |
In this work, we investigate the global memory access mechanism on recent GPUs. For the purpose of this study, we created specific benchmark programs, which allowed us to explore the scheduling of global memory transactions. Thus, we formulate a model capable of estimating the execution time for a large class of applications. Our main goal is to facilitate optimisation of regular data-parallel applications on GPUs. As an example, we finally describe our CUDA implementations of LBM flow solvers on which our model was able to estimate performance with less than 5% relative error.