Advanced compiler optimizations for supercomputers
Communications of the ACM - Special issue on parallelism
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Inline function expansion for compiling C programs
PLDI '89 Proceedings of the ACM SIGPLAN 1989 Conference on Programming language design and implementation
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Code duplication: an assist for global instruction scheduling
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Limits of control flow on parallelism
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Avoiding unconditional jumps by code replication
PLDI '92 Proceedings of the ACM SIGPLAN 1992 conference on Programming language design and implementation
Improving the accuracy of dynamic branch prediction using branch correlation
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Predicting conditional branch directions from previous runs of a program
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
An efficient resource-constrained global scheduling technique for superscalar and VLIW processors
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
A comparison of dynamic branch predictors that use two levels of branch history
ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
The multiflow trace scheduling compiler
The Journal of Supercomputing - Special issue on instruction-level parallelism
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Improving semi-static branch prediction by code replication
PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Avoiding conditional branches by code replication
PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A comparative analysis of schemes for correlated branch prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Performance issues in correlated branch prediction schemes
Proceedings of the 28th annual international symposium on Microarchitecture
Dynamic path-based branch correlation
Proceedings of the 28th annual international symposium on Microarchitecture
The predictability of branches in libraries
Proceedings of the 28th annual international symposium on Microarchitecture
An analysis of dynamic branch prediction schemes on system workloads
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Analysis of branch prediction via data compression
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Efficient procedure mapping using cache line coloring
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Near-optimal intraprocedural branch alignment
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Implementation and analysis of path history in dynamic branch prediction schemes
ICS '97 Proceedings of the 11th international conference on Supercomputing
Exploiting instruction level parallelism in processors by caching scheduled groups
Proceedings of the 24th annual international symposium on Computer architecture
Target prediction for indirect jumps
Proceedings of the 24th annual international symposium on Computer architecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Predicting data cache misses in non-numeric applications through correlation profiling
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Improving performance by branch reordering
PLDI '98 Proceedings of the ACM SIGPLAN 1998 conference on Programming language design and implementation
Retrospective: alternative implementations of two-level adaptive training branch prediction
25 years of the international symposia on Computer architecture (selected papers)
Implementation and Analysis of Path History in Dynamic Branch Prediction Schemes
IEEE Transactions on Computers
Analyzing the working set characteristics of branch execution
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Better global scheduling using path profiles
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Variable length path branch prediction
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Load-reuse analysis: design and evaluation
Proceedings of the ACM SIGPLAN 1999 conference on Programming language design and implementation
Walk-Time Address Adjustment for Improving the Accuracy of Dynamic Branch Prediction
IEEE Transactions on Computers
IEEE Transactions on Computers
Static correlated branch prediction
ACM Transactions on Programming Languages and Systems (TOPLAS)
Limits of Data Value Predictability
International Journal of Parallel Programming
IEEE Transactions on Computers
Increasing the size of atomic instruction blocks using control flow assertions
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
rePLay: A Hardware Framework for Dynamic Optimization
IEEE Transactions on Computers
Efficient and effective branch reordering using profile data
ACM Transactions on Programming Languages and Systems (TOPLAS)
Branch Effect Reduction Techniques
Computer
Branch Prediction Using Profile Data
Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Optimizing indirect branch prediction accuracy in virtual machine interpreters
PLDI '03 Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation
Profile-based dynamic voltage and frequency scaling for a multiple clock domain microprocessor
Proceedings of the 30th annual international symposium on Computer architecture
Extending Path Profiling across Loop Backedges and Procedure Boundaries
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
The Accuracy of Initial Prediction in Two-Phase Dynamic Binary Translators
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Optimizing indirect branch prediction accuracy in virtual machine interpreters
ACM Transactions on Programming Languages and Systems (TOPLAS)
Dynamic branch prediction and control speculation
International Journal of High Performance Systems Architecture
Cache sensitive code arrangement for virtual machine
Transactions on high-performance embedded architectures and compilers III
Hybrid predictors for next location prediction
UIC'06 Proceedings of the Third international conference on Ubiquitous Intelligence and Computing
Static techniques to improve power efficiency of branch predictors
HiPC'04 Proceedings of the 11th international conference on High Performance Computing
Using decision trees to improve program-based and profile-based static branch prediction
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.02 |
Recent work in history-based branch prediction uses novel hardware structures to capture branch correlation and increase branch prediction accuracy. We present a profile-based code transformation that exploits branch correlation to improve the accuracy of static branch prediction schemes. Our general method encodes branch history information in the program counter through the duplication and placement of program basic blocks. For correlation histories of eight branches, our experimental results achieve up to a 14.7% improvement in prediction accuracy over conventional profile-based prediction without any increase in the dynamic instruction count of our benchmark applications. In the majority of these applications, code duplication increases code size by less than 30%. For the few applications with code segments that exhibit exponential branching paths and no branch correlation, simple compile-time heuristics can eliminate these branches as code-transformation candidates.