PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Region-based compilation: an introduction and motivation
Proceedings of the 28th annual international symposium on Microarchitecture
Hardware-based profiling: an effective technique for profile-driven optimization
International Journal of Parallel Programming
Fast, effective dynamic compilation
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Accurate and practical profile-driven compilation using the profile buffer
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Hot cold optimization of large Windows/NT applications
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exploiting hardware performance counters with flow and context sensitive profiling
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Continuous profiling: where have all the cycles gone?
Proceedings of the sixteenth ACM symposium on Operating systems principles
System support for automatic profiling and optimization
Proceedings of the sixteenth ACM symposium on Operating systems principles
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Execution characteristics of desktop applications on Windows NT
Proceedings of the 25th annual international symposium on Computer architecture
The Effect of Code Expanding Optimizations on Instruction Cache Design
IEEE Transactions on Computers
JMLC '97 Proceedings of the Joint Modular Languages Conference on Modular Programming Languages
Static program analysis to enhance profile independence in instruction-level parallelism compilation
Static program analysis to enhance profile independence in instruction-level parallelism compilation
Using complete system simulation to characterize SPECjvm98 benchmarks
Proceedings of the 14th international conference on Supercomputing
A hardware mechanism for dynamic extraction and relayout of program hot spots
Proceedings of the 27th annual international symposium on Computer architecture
Software profiling for hot path prediction: less is more
ACM SIGPLAN Notices
Hardware support for dynamic activation of compiler-directed computation reuse
ACM SIGPLAN Notices
Relational profiling: enabling thread-level parallelism in virtual machines
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Java Runtime Systems: Characterization and Architectural Implications
IEEE Transactions on Computers
Power aware microarchitecture resource scaling
Proceedings of the conference on Design, automation and test in Europe
Software profiling for hot path prediction: less is more
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Hardware support for dynamic activation of compiler-directed computation reuse
ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Rapid profiling via stratified sampling
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
An Architectural Framework for Runtime Optimization
IEEE Transactions on Computers
Understanding the impact of X86/NT computing on microarchitecture
Workload characterization of emerging computer applications
Scenario-based software characterization as a contingency to traditional program profiling
CASES '02 Proceedings of the 2002 international conference on Compilers, architecture, and synthesis for embedded systems
Application-driven processor design exploration for power-performance trade-off analysis
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Microarchitecture-level power management
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Vacuum packing: extracting hardware-detected program phases for post-link optimization
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
DELI: a new run-time control point
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Improving quasi-dynamic schedules through region slip
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Selecting long atomic traces for high coverage
ICS '03 Proceedings of the 17th annual international conference on Supercomputing
Catching Accurate Profiles in Hardware
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Dynamic Optimization of Micro-Operations
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Frequent loop detection using efficient non-intrusive on-chip hardware
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Specialized Dynamic Optimizations for High-Performance Energy-Efficient Microarchitecture
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
EXPERT: expedited simulation exploiting program behavior repetition
Proceedings of the 18th annual international conference on Supercomputing
Power Awareness through Selective Dynamically Optimized Traces
Proceedings of the 31st annual international symposium on Computer architecture
Power-aware compilation for register file energy reduction
International Journal of Parallel Programming - Special issue: Workshop on application specific processors (WASP)
HotSpot cache: joint temporal and spatial locality exploitation for i-cache energy reduction
Proceedings of the 2004 international symposium on Low power electronics and design
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
A Programmable Hardware Path Profiler
Proceedings of the international symposium on Code generation and optimization
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Online Phase Detection Algorithms
Proceedings of the International Symposium on Code Generation and Optimization
Region Monitoring for Local Phase Detection in Dynamic Optimization Systems
Proceedings of the International Symposium on Code Generation and Optimization
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set
Proceedings of the International Symposium on Code Generation and Optimization
Efficient remote profiling for resource-constrained devices
ACM Transactions on Architecture and Code Optimization (TACO)
Selective compilation via fast code analysis and bytecode tracing
Proceedings of the 2006 ACM symposium on Applied computing
Shadow Profiling: Hiding Instrumentation Costs with Parallelism
Proceedings of the International Symposium on Code Generation and Optimization
PEAK—a fast and effective performance tuning system via compiler optimization orchestration
ACM Transactions on Programming Languages and Systems (TOPLAS)
Non-intrusive dynamic application profiler for detailed loop execution characterization
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
The road not taken: Estimating path execution frequency statically
ICSE '09 Proceedings of the 31st International Conference on Software Engineering
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
A hardware hot loop path detector for dynamic parallelization and optimization
ACST '08 Proceedings of the Fourth IASTED International Conference on Advances in Computer Science and Technology
TAO: two-level atomicity for dynamic binary optimizations
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Taming hardware event samples for FDO compilation
Proceedings of the 8th annual IEEE/ACM international symposium on Code generation and optimization
Workload characterization for operator-based distributed stream processing applications
Proceedings of the Fourth ACM International Conference on Distributed Event-Based Systems
Fast configurable-cache tuning with a unified second-level cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
PARROT: power awareness through selective dynamically optimized traces
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Threadguide: profiler assisted application adaptation on CMP
Proceedings of the 5th IBM Collaborative Academia Research Exchange Workshop
Warm-Up Simulation Methodology for HW/SW Co-Designed Processors
Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
Hi-index | 0.01 |
This paper presents a novel hardware-based approach for identifying, profiling, and monitoring hot spots in order to support runtime optimization of general purpose programs. The proposed approach consists of a set of tightly coupled hardware tables and control logic modules that are placed in the retirement stage of a processor pipeline removed from the critical path. The features of the proposed design include rapid detection of program hot spots after changes in execution behavior, runtime-tunable selection criteria for hot spot detection, and negligible overhead during application execution. Experiments using several SPEC95 benchmarks, as well as several large WindowsNT applications, demonstrate the promise of the proposed design.