ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Earthquake ground motion modeling on parallel computers
Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Tools for application-oriented performance tuning
ICS '01 Proceedings of the 15th international conference on Supercomputing
LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Data centric cache measurement using hardware and software instrumentation
Data centric cache measurement using hardware and software instrumentation
Data Centric Cache Measurement on the Intel ltanium 2 Processor
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
StatCache: a probabilistic approach to efficient and accurate data locality analysis
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Pinpointing data locality problems using data-centric analysis
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Data centric analysis using direct measurements has been established as a successful performance analysis technique. Information gathered with this technique can map cache misses to program variables. These mappings can then be used to address data locality problems and other issues. Existing approaches rely on special hardware support which is needed to negate a 'skid' factor. Our approach is viable when the special hardware support is not present, but where skid is still an issue. Prior methods also rely on maintaining runtime information about memory allocation addresses for variables, which may lead to program perturbation. Our approach uses software analysis to eliminate the need for maintaining allocation and free records. We show that by using heuristics our technique can attribute cache misses to program variables while maintaining the approximate rank-order found by using traditional techniques. We also show that there exists a high correlation between the misses attributed by our approximation and the misses assigned by examining direct measurements.