Optimally profiling and tracing programs
ACM Transactions on Programming Languages and Systems (TOPLAS)
Static branch frequency and program profile analysis
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Hardware-based profiling: an effective technique for profile-driven optimization
International Journal of Parallel Programming
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A tutorial on support vector regression
Statistics and Computing
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Low-overhead call path profiling of unmodified, optimized code
Proceedings of the 19th annual international conference on Supercomputing
Online optimizations driven by hardware performance monitoring
Proceedings of the 2007 ACM SIGPLAN conference on Programming language design and implementation
Optimal Insertion of Software Probes in Well-Delimited Programs
IEEE Transactions on Software Engineering
Complementing missing and inaccurate profiling using a minimum cost circulation algorithm
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Evaluating the accuracy of Java profilers
PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
RACEZ: a lightweight and non-invasive race detection tool for production applications
Proceedings of the 33rd International Conference on Software Engineering
Exploiting hardware advances for software testing and debugging (NIER track)
Proceedings of the 33rd International Conference on Software Engineering
MAO -- An extensible micro-architectural optimizer
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
HQEMU: a multi-threaded and retargetable dynamic binary translator on multicores
Proceedings of the Tenth International Symposium on Code Generation and Optimization
THeME: a system for testing by hardware monitoring events
Proceedings of the 2012 International Symposium on Software Testing and Analysis
Siblingrivalry: online autotuning through local competitions
Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Simple profile rectifications go a long way
ECOOP'13 Proceedings of the 27th European conference on Object-Oriented Programming
Hi-index | 0.01 |
Feedback-directed optimization (FDO) is effective in improving application runtime performance, but has not been widely adopted due to the tedious dual-compilation model, the difficulties in generating representative training data sets, and the high runtime overhead of profile collection. The use of hardware-event sampling to generate estimated edge profiles overcomes these drawbacks. Yet, hardware event samples are typically not precise at the instruction or basic-block granularity. These inaccuracies lead to missed performance when compared to instrumentation-based FDO@. In this paper, we use multiple hardware event profiles and supervised learning techniques to generate heuristics for improved precision of basic-block-level sample profiles, and to further improve the smoothing algorithms used to construct edge profiles. We demonstrate that sampling-based FDO can achieve an average of 78% of the performance gains obtained using instrumentation-based exact edge profiles for SPEC2000 benchmarks, matching or beating instrumentation-based FDO in many cases. The overhead of collection is only 0.74% on average, while compiler based instrumentation incurs 6.8%-53.5% overhead (and 10x overhead on an industrial web search application), and dynamic instrumentation incurs 28.6%-1639.2% overhead.