Using profile information to assist classic code optimizations
Software—Practice & Experience
Predicting conditional branch directions from previous runs of a program
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Using branch handling hardware to support profile-driven optimization
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Optimization for a superscalar out-of-order machine
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Continuous profiling: where have all the cycles gone?
ACM Transactions on Computer Systems (TOCS)
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
Scalable cross-module optimization
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Optimizing alpha executables on Windows NT with spike
Digital Technical Journal
A hardware mechanism for dynamic extraction and relayout of program hot spots
Proceedings of the 27th annual international symposium on Computer architecture
Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Rapid profiling via stratified sampling
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic Binary Translation and Optimization
IEEE Transactions on Computers
Continuous Program Optimization: Design and Evaluation
IEEE Transactions on Computers
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
Managing multi-configuration hardware via dynamic working set analysis
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Gprof: A call graph execution profiler
SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction
The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic run-time architecture techniques for enabling continuous optimization
Proceedings of the 2nd conference on Computing frontiers
Proceedings of the 32nd annual international symposium on Computer Architecture
Incremental Commit Groups for Non-Atomic Trace Processing
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining
IEEE Transactions on Computers
Shadow Profiling: Hiding Instrumentation Costs with Parallelism
Proceedings of the International Symposium on Code Generation and Optimization
Pipa: pipelined profiling and analysis on multi-core systems
Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Improving instrumentation speed via buffering
Proceedings of the Workshop on Binary Instrumentation and Applications
PiPA: Pipelined profiling and analysis on multicore systems
ACM Transactions on Architecture and Code Optimization (TACO)
Demand-driven software race detection using hardware performance counters
Proceedings of the 38th annual international symposium on Computer architecture
Loaf: a framework and infrastructure for creating online adaptive solutions
Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
MT-Profiler: a parallel dynamic analysis framework based on two-stage sampling
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Mining opportunities for code improvement in a just-in-time compiler
CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
THeME: a system for testing by hardware monitoring events
Proceedings of the 2012 International Symposium on Software Testing and Analysis
Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops
Hi-index | 0.00 |
Optimizing programs at run-time provides opportunities to apply aggressive optimizations to programs based on information that was not available at compile time. At run time, programs can be adapted to better exploit architectural features, optimize the use of dynamic libraries, and simplify code based on run-time constants.Our profiling system provides a framework for collecting information required for performing run-time optimization. We sample the performance hardware registers available on an ltanium processor, and select a set of code that is likely to lead to important performance-events. We gather distribution information about the performance-events we wish to monitor, and test our traces by estimating the ability for dynamic patching of a program to execute run-time generated traces.Our results show that we are able to capture 58% of execution time across various SPEC2000 integer benchmarks using our profile and patching techniques on a relatively small number of frequently executed execution paths. Our profiling and detection system overhead increases execution time by only 2--4%.