Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
HPCVIEW: A Tool for Top-down Analysis of Node Performance
The Journal of Supercomputing
The role of instrumentation and mapping in performance measurement
The role of instrumentation and mapping in performance measurement
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
The Tau Parallel Performance System
International Journal of High Performance Computing Applications
Compensation of Measurement Overhead in Parallel Performance Profiling
International Journal of High Performance Computing Applications
IEEE Transactions on Software Engineering
Open | SpeedShop: An open source infrastructure for parallel performance analysis
Scientific Programming - Large-Scale Programming Tools and Environments
A Generic and Configurable Source-Code Instrumentation Component
ICCS 2009 Proceedings of the 9th International Conference on Computational Science
SPEC MPI2007—an application benchmark suite for parallel systems using MPI
Concurrency and Computation: Practice & Experience - International Supercomputing Conference (ISC07)
HPCTOOLKIT: tools for performance analysis of optimized parallel programs http://hpctoolkit.org
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
The Scalasca performance toolset architecture
Concurrency and Computation: Practice & Experience - Scalable Tools for High-End Computing
Detailed performance analysis using coarse grain sampling
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
Hi-index | 0.00 |
Preparing performance measurements of HPC applications is usually a tradeoff between accuracy and granularity of the measured data. When using direct instrumentation, that is, the insertion of extra code around performance-relevant functions, the measurement overhead increases with the rate at which these functions are visited. If applied indiscriminately, the measurement dilation can even be prohibitive. In this paper, we show how static code analysis in combination with binary rewriting can help eliminate unnecessary instrumentation points based on configurable filter rules. In contrast to earlier approaches, our technique does not rely on dynamic information, making extra runs prior to the actual measurement dispensable. Moreover, the rules can be applied and modified without re-compilation. We evaluate filter rules designed for the analysis of computation and communication performance and show that in most cases the measurement dilation can be reduced to a few percent while still retaining significant detail.