Using branch handling hardware to support profile-driven optimization

Authors:
Thomas M. Conte;Burzin A. Patel;J. Stan Cox
Affiliations:
Department of Electrical and Computer Engineering, University of South Carolina, Columbia, South Carolina;Department of Electrical and Computer Engineering, University of South Carolina, Columbia, South Carolina;Database and Compiler Technology, AT&T Global Information Solutions, Columbia, South Carolina
Venue:
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Year:
1994

Citing 14
Cited 16

Trace selection for compiling large C application programs to microcode

MICRO 21 Proceedings of the 21st annual workshop on Microprogramming and microarchitecture
Inline function expansion for compiling C programs

PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Achieving high instruction cache performance with an optimizing compiler

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Predicting program behavior using real or estimated profiles

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Using profile information to assist classic code optimizations

Software—Practice & Experience
Predicting conditional branch directions from previous runs of a program

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Branch prediction for free

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A comparison of dynamic branch predictors that use two levels of branch history

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Accurate static estimators for program optimization

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Data preload for superscalar and VLIW processors

Data preload for superscalar and VLIW processors
Superblock formation using static program analysis

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Architecture of the Pentium Microprocessor

IEEE Micro
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture

Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Importance of profiling and compatibility

ACM Computing Surveys (CSUR) - Special issue: position statements on strategic directions in computing research
Accurate and practical profile-driven compilation using the profile buffer

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Relational profiling: enabling thread-level parallelism in virtual machines

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic trace selection using performance monitoring hardware sampling

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Targeted Path Profiling: Lower Overhead Path Profiling for Staged Dynamic Optimization Systems

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Practical Path Profiling for Dynamic Optimizers

Proceedings of the international symposium on Code generation and optimization
Continuous Path and Edge Profiling

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Profiling over Adaptive Ranges

Proceedings of the International Symposium on Code Generation and Optimization
Introspective 3D chips

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Shadow Profiling: Hiding Instrumentation Costs with Parallelism

Proceedings of the International Symposium on Code Generation and Optimization
Formulating and implementing profiling over adaptive ranges

ACM Transactions on Architecture and Code Optimization (TACO)
SoftHV: a HW/SW co-designed processor with horizontal and vertical fusion

Proceedings of the 8th ACM International Conference on Computing Frontiers
Profiling all paths: A new profiling technique for both cyclic and acyclic paths

Journal of Systems and Software
Trace construction using enhanced performance monitoring

Proceedings of the ACM International Conference on Computing Frontiers

Quantified Score

Hi-index	0.00

Visualization

Abstract

Profile-based optimizations can be used for instruction scheduling, loop scheduling, data preloading, function in-lining, and instruction cache performance enhancement. However, these techniques have not been embraced by software vendors because programs instrumented for profiling run 2–30 times slower, an awkward compile-run-recompile sequence is required, and a test input suite must be collected and validated for each program. This paper proposes using existing branch handling hardware to generate profile information in real time. Techniques are presented for both one-level and two-level branch hardware organizations. The approach produces high accuracy with small slowdown in execution (0.4%–4.6%). This allows a program to be profiled while it is used, eliminating the need for a test input suite. This practically removes the inconvenience of profiling. With contemporary processors driven increasingly by compiler support, hardware-based profiling is important for high-performance systems.