Trace selection for compiling large C application programs to microcode
MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Inline function expansion for compiling C programs
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Achieving high instruction cache performance with an optimizing compiler
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Predicting program behavior using real or estimated profiles
PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Using profile information to assist classic code optimizations
Software—Practice & Experience
Predicting conditional branch directions from previous runs of a program
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A comparison of dynamic branch predictors that use two levels of branch history
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Accurate static estimators for program optimization
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Data preload for superscalar and VLIW processors
Data preload for superscalar and VLIW processors
Superblock formation using static program analysis
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Architecture of the Pentium Microprocessor
IEEE Micro
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Importance of profiling and compatibility
ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Accurate and practical profile-driven compilation using the profile buffer
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Relational profiling: enabling thread-level parallelism in virtual machines
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic trace selection using performance monitoring hardware sampling
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Targeted Path Profiling: Lower Overhead Path Profiling for Staged Dynamic Optimization Systems
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Practical Path Profiling for Dynamic Optimizers
Proceedings of the international symposium on Code generation and optimization
Continuous Path and Edge Profiling
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Profiling over Adaptive Ranges
Proceedings of the International Symposium on Code Generation and Optimization
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Shadow Profiling: Hiding Instrumentation Costs with Parallelism
Proceedings of the International Symposium on Code Generation and Optimization
Formulating and implementing profiling over adaptive ranges
ACM Transactions on Architecture and Code Optimization (TACO)
SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion
Proceedings of the 8th ACM International Conference on Computing Frontiers
Profiling all paths: A new profiling technique for both cyclic and acyclic paths
Journal of Systems and Software
Trace construction using enhanced performance monitoring
Proceedings of the ACM International Conference on Computing Frontiers
Hi-index | 0.00 |
Profile-based optimizations can be used for instruction scheduling, loop scheduling, data preloading, function in-lining, and instruction cache performance enhancement. However, these techniques have not been embraced by software vendors because programs instrumented for profiling run 2–30 times slower, an awkward compile-run-recompile sequence is required, and a test input suite must be collected and validated for each program. This paper proposes using existing branch handling hardware to generate profile information in real time. Techniques are presented for both one-level and two-level branch hardware organizations. The approach produces high accuracy with small slowdown in execution (0.4%–4.6%). This allows a program to be profiled while it is used, eliminating the need for a test input suite. This practically removes the inconvenience of profiling. With contemporary processors driven increasingly by compiler support, hardware-based profiling is important for high-performance systems.