Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization
COMPSAC '09 Proceedings of the 2009 33rd Annual IEEE International Computer Software and Applications Conference - Volume 01
Accelerating Lattice Boltzmann Fluid Flow Simulations Using Graphics Processors
ICPP '09 Proceedings of the 2009 International Conference on Parallel Processing
Introducing a performance model for bandwidth-limited loop kernels
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Extracting ultra-scale Lattice Boltzmann performance via hierarchical and distributed auto-tuning
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Editorial: Mesoscopic Methods in Engineering and Science
Computers & Mathematics with Applications
Hi-index | 0.09 |
Several possibilities exist to implement the propagation step of lattice Boltzmann methods. This paper describes common implementations and compares the number of memory transfer operations they require per lattice node update. A performance model based on the memory bandwidth is then used to obtain an estimation of the maximum achievable performance on different machines. A subset of the discussed implementations of the propagation step are benchmarked on different Intel- and AMD-based compute nodes using the framework of an existing flow solver that is specially adapted to simulate flow in porous media, and the model is validated against the measurements. Advanced approaches for the propagation step like ''A-A pattern'' or ''Esoteric Twist'' require more programming effort but often sustain significantly better performance than non-naive but straightforward implementations.