ACM Transactions on Computer Systems (TOCS)
LogP: towards a realistic model of parallel computation
PPOPP '93 Proceedings of the fourth ACM SIGPLAN symposium on Principles and practice of parallel programming
LogGP: incorporating long messages into the LogP model for parallel computation
Journal of Parallel and Distributed Computing
Proceedings of the 14th international conference on Supercomputing
Predictive performance and scalability modeling of a large-scale application
Proceedings of the 2001 ACM/IEEE conference on Supercomputing
A regression-based approach to scalability prediction
Proceedings of the 22nd annual international conference on Supercomputing
Hybrid Performance Modeling and Prediction of Large-Scale Computing Systems
CISIS '08 Proceedings of the 2008 International Conference on Complex, Intelligent and Software Intensive Systems
The impact of network noise at large-scale communication performance
IPDPS '09 Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing
The Gemini System Interconnect
HOTI '10 Proceedings of the 2010 18th IEEE Symposium on High Performance Interconnects
Performance modeling for systematic performance tuning
State of the Practice Reports
Layout-aware scientific computing: a case study using MILC
Proceedings of the second workshop on Scalable algorithms for large-scale systems
Using automated performance modeling to find scalability bugs in complex codes
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Enabling highly-scalable remote memory access programming with MPI-3 one sided
SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Hi-index | 0.00 |
Application performance modeling is an essential part of application and system development as HPC moves into the petascale and prepares for the exascale. However, performance modeling of parallel systems is a difficult task due to natural variations in measurements and noise effects. In this paper, we give a detailed example for a semi-analytical performance-modeling method applied to the ubiquitous HPC application su3 rmd from the lattice Quantum Chromo dynamics field on a variety of parallel computing platforms. We apply statistical techniques that are well known in natural sciences to model the variance in the input system. Using a simple analytical model to capture the main characteristics of the code, such as numbers and sizes of passed messages and invocation counts of serial code blocks in conjunction with statistically sound curve fitting methods, we develop an accurate performance model and use it to characterize application performance on various target architectures. Our fitting techniques allow us to characterize the variance of different performance observations on a given system and show the influence of noise from different sources. The techniques we developed can be applied to a wide class of bulk-synchronous applications. With this detailed example, we aim to motivate the scientific computing community to develop and use similar performance models for software development and maintenance.