WBTK: a New Set of Microbenchmarks to Explore Memory System Performance for Scientific Computing
International Journal of High Performance Computing Applications
Comparison of different propagation steps for lattice Boltzmann methods
Computers & Mathematics with Applications
Performance engineering: from numbers to insight
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
International Journal of High Performance Computing Applications
A framework for hybrid parallel flow simulations with a trillion cells in complex geometries
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
We present a diagnostic performance model for bandwidth-limited loop kernels which is founded on the analysis of modern cache based microarchitectures. This model allows an accurate performance prediction and evaluation for existing instruction codes. It provides an in-depth understanding of how performance for different memory hierarchy levels is made up. The performance of raw memory load, store and copy operations and a stream vector triad are analyzed and benchmarked on three modern x86-type quad-core architectures in order to demonstrate the capabilities of the model.