Dynamic trace selection using performance monitoring hardware sampling

Authors:
Howard Chen;Wei-Chung Hsu;Jiwei Lu;Pen-Chung Yew;Dong-Yuan Chen
Affiliations:
University of Minnesota;University of Minnesota;University of Minnesota;University of Minnesota;Microprocessor Research Labs, Intel
Venue:
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Year:
2003

Citing 24
Cited 18

Using profile information to assist classic code optimizations

Software—Practice & Experience
Predicting conditional branch directions from previous runs of a program

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Effective compiler support for predicated execution using the hyperblock

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Using branch handling hardware to support profile-driven optimization

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient path profiling

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Optimization for a superscalar out-of-order machine

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Continuous profiling: where have all the cycles gone?

ACM Transactions on Computer Systems (TOCS)
System support for automatic profiling and optimization

Proceedings of the sixteenth ACM symposium on Operating systems principles
Scalable cross-module optimization

PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Optimizing alpha executables on Windows NT with spike

Digital Technical Journal
A hardware mechanism for dynamic extraction and relayout of program hot spots

Proceedings of the 27th annual international symposium on Computer architecture
Instruction path coprocessors

Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Rapid profiling via stratified sampling

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Dynamic Binary Translation and Optimization

IEEE Transactions on Computers
Continuous Program Optimization: Design and Evaluation

IEEE Transactions on Computers
rePLay: A Hardware Framework for Dynamic Optimization

IEEE Transactions on Computers
Managing multi-configuration hardware via dynamic working set analysis

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Dynamic and Transparent Binary Translation

Computer
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Gprof: A call graph execution profiler

SIGPLAN '82 Proceedings of the 1982 SIGPLAN symposium on Compiler construction

The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
IA-32 Execution Layer: a two-phase dynamic translator designed to support IA-32 applications on Itanium®-based systems

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic run-time architecture techniques for enabling continuous optimization

Proceedings of the 2nd conference on Computing frontiers
Continuous Optimization

Proceedings of the 32nd annual international symposium on Computer Architecture
Incremental Commit Groups for Non-Atomic Trace Processing

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Beating In-Order Stalls with "Flea-Flicker" Two-Pass Pipelining

IEEE Transactions on Computers
Shadow Profiling: Hiding Instrumentation Costs with Parallelism

Proceedings of the International Symposium on Code Generation and Optimization
Pipa: pipelined profiling and analysis on multi-core systems

Proceedings of the 6th annual IEEE/ACM international symposium on Code generation and optimization
Scenario Based Optimization: A Framework for Statically Enabling Online Optimizations

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
A Practical Approach to Hardware Performance Monitoring Based Dynamic Optimizations in a Production JVM

Proceedings of the 7th annual IEEE/ACM International Symposium on Code Generation and Optimization
Improving instrumentation speed via buffering

Proceedings of the Workshop on Binary Instrumentation and Applications
PiPA: Pipelined profiling and analysis on multicore systems

ACM Transactions on Architecture and Code Optimization (TACO)
Demand-driven software race detection using hardware performance counters

Proceedings of the 38th annual international symposium on Computer architecture
Loaf: a framework and infrastructure for creating online adaptive solutions

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
MT-Profiler: a parallel dynamic analysis framework based on two-stage sampling

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Mining opportunities for code improvement in a just-in-time compiler

CC'10/ETAPS'10 Proceedings of the 19th joint European conference on Theory and Practice of Software, international conference on Compiler Construction
THeME: a system for testing by hardware monitoring events

Proceedings of the 2012 International Symposium on Software Testing and Analysis
Performance patterns and hardware metrics on modern multicore processors: best practices for performance engineering

Euro-Par'12 Proceedings of the 18th international conference on Parallel processing workshops

Quantified Score

Hi-index	0.00

Visualization

Abstract

Optimizing programs at run-time provides opportunities to apply aggressive optimizations to programs based on information that was not available at compile time. At run time, programs can be adapted to better exploit architectural features, optimize the use of dynamic libraries, and simplify code based on run-time constants.Our profiling system provides a framework for collecting information required for performing run-time optimization. We sample the performance hardware registers available on an ltanium processor, and select a set of code that is likely to lead to important performance-events. We gather distribution information about the performance-events we wish to monitor, and test our traces by estimating the ability for dynamic patching of a program to execute run-time generated traces.Our results show that we are able to capture 58% of execution time across various SPEC2000 integer benchmarks using our profile and patching techniques on a relatively small number of frequently executed execution paths. Our profiling and detection system overhead increases execution time by only 2--4%.