Numerical recipes: the art of scientific computing
Numerical recipes: the art of scientific computing
Effectiveness of trace sampling for performance debugging tools
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A scalable cross-platform infrastructure for application performance tuning using hardware counters
Proceedings of the 2000 ACM/IEEE conference on Supercomputing
The Augmint multiprocessor simulation toolkit for Intel x86 architectures
ICCD '96 Proceedings of the 1996 International Conference on Computer Design, VLSI in Computers and Processors
MINT: A Front End for Efficient Simulation of Shared-Memory Multiprocessors
MASCOTS '94 Proceedings of the Second International Workshop on Modeling, Analysis, and Simulation On Computer and Telecommunication Systems
SIGMA: a simulator infrastructure to guide memory analysis
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
A framework for performance modeling and prediction
Proceedings of the 2002 ACM/IEEE conference on Supercomputing
Data Centric Cache Measurement on the Intel ltanium 2 Processor
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
An API for Runtime Code Patching
International Journal of High Performance Computing Applications
Performance Modeling of Communication and Computation in Hybrid MPI and OpenMP Applications
ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 2
Log summarization and anomaly detection for troubleshooting distributed systems
GRID '07 Proceedings of the 8th IEEE/ACM International Conference on Grid Computing
Detailed performance analysis using coarse grain sampling
Euro-Par'09 Proceedings of the 2009 international conference on Parallel processing
GRace: a low-overhead mechanism for detecting data races in GPU programs
Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Automatic structure extraction from MPI applications tracefiles
Euro-Par'07 Proceedings of the 13th international Euro-Par conference on Parallel Processing
Hi-index | 0.00 |
Detailed cache simulation can be useful to both system developers and application writers to understand an application's performance. However, measuring long running programs can be extremely slow. In this paper we present a technique to use dynamic sampling of trace snippets throughout an application's execution. We demonstrate that our approach improves accuracy compared to sampling a few timesteps at the beginning of execution by judiciously choosing the frequency, as well as the points in the control flow, at which samples are collected. Our approach is validated using the SIGMA tracing and simulation framework for the IBM Power family of processors.