A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
Distance Field Manipulation of Surface Models
IEEE Computer Graphics and Applications
Signed Distance Computation Using the Angle Weighted Pseudonormal
IEEE Transactions on Visualization and Computer Graphics
Performance evaluation of a parallel sparse lattice Boltzmann solver
Journal of Computational Physics
Roofline: an insightful visual performance model for multicore architectures
Communications of the ACM - A Direct Path to Dependable Software
Introducing a performance model for bandwidth-limited loop kernels
PPAM'09 Proceedings of the 8th international conference on Parallel processing and applied mathematics: Part I
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Direct Numerical Simulation of Particulate Flows on 294912 Processor Cores
Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Hybrid parallel simulations of fluid flows in complex geometries: application to the human lungs
Euro-Par 2010 Proceedings of the 2010 conference on Parallel processing
p4est: Scalable Algorithms for Parallel Adaptive Mesh Refinement on Forests of Octrees
SIAM Journal on Scientific Computing
Memory performance at reduced CPU clock speeds: an analysis of current x86_64 processors
HotPower'12 Proceedings of the 2012 USENIX conference on Power-Aware Computing and Systems
Optimized hybrid parallel lattice boltzmann fluid flow simulations on complex geometries
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
IBM System Blue Gene Solution: Blue Gene/Q System Administration
IBM System Blue Gene Solution: Blue Gene/Q System Administration
Hi-index | 0.00 |
waLBerla is a massively parallel software framework for simulating complex flows with the lattice Boltzmann method (LBM). Performance and scalability results are presented for SuperMUC, the world's fastest x86-based supercomputer ranked number 6 on the Top500 list, and JUQUEEN, a Blue Gene/Q system ranked as number 5. We reach resolutions with more than one trillion cells and perform up to 1.93 trillion cell updates per second using 1.8 million threads. The design and implementation of waLBerla is driven by a careful analysis of the performance on current petascale supercomputers. Our fully distributed data structures and algorithms allow for efficient, massively parallel simulations on these machines. Elaborate node level optimizations and vectorization using SIMD instructions result in highly optimized compute kernels for the single- and two-relaxation-time LBM. Excellent weak and strong scaling is achieved for a complex vascular geometry of the human coronary tree.