Solving problems on concurrent processors
Solving problems on concurrent processors
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
An object oriented approach to lattice gas modeling
Future Generation Computer Systems - Special issue on high performance computing and networking Europe 1999
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Multilevel algorithms for partitioning power-law graphs
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Fluid flow simulation on the Cell Broadband Engine using the lattice Boltzmann method
Computers & Mathematics with Applications
HPCS'09 Proceedings of the 23rd international conference on High Performance Computing Systems and Applications
SPEC OMP2012 -- an application benchmark suite for parallel systems using OpenMP
IWOMP'12 Proceedings of the 8th international conference on OpenMP in a Heterogeneous World
A Multi-GPU implementation of a d2q37 lattice boltzmann code
PPAM'11 Proceedings of the 9th international conference on Parallel Processing and Applied Mathematics - Volume Part I
A framework for hybrid parallel flow simulations with a trillion cells in complex geometries
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 31.45 |
We develop a performance prediction model for a parallelized sparse lattice Boltzmann solver and present performance results for simulations of flow in a variety of complex geometries. A special focus is on partitioning and memory/load balancing strategy for geometries with a high solid fraction and/or complex topology such as porous media, fissured rocks and geometries from medical applications. The topology of the lattice nodes representing the fluid fraction of the computational domain is mapped on a graph. Graph decomposition is performed with both multilevel recursive-bisection and multilevel k-way schemes based on modified Kernighan-Lin and Fiduccia-Mattheyses partitioning algorithms. Performance results and optimization strategies are presented for a variety of platforms, showing a parallel efficiency of almost 80% for the largest problem size. A good agreement between the performance model and experimental results is demonstrated.