Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Profile guided code positioning
PLDI '90 Proceedings of the ACM SIGPLAN 1990 conference on Programming language design and implementation
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Branch classification: a new mechanism for improving branch predictor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evidence-based static branch prediction using machine learning
ACM Transactions on Programming Languages and Systems (TOPLAS)
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Near-optimal intraprocedural branch alignment
Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
The agree predictor: a mechanism for reducing negative branch history interference
Proceedings of the 24th annual international symposium on Computer architecture
Trading conflict and capacity aliasing in conditional branch predictors
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Analyzing the working set characteristics of branch execution
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
The YAGS branch prediction scheme
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Walk-Time Address Adjustment for Improving the Accuracy of Dynamic Branch Prediction
IEEE Transactions on Computers
ICS '99 Proceedings of the 13th international conference on Supercomputing
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Static correlated branch prediction
ACM Transactions on Programming Languages and Systems (TOPLAS)
The impact of delay on the design of branch predictors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The Alpha 21264 Microprocessor
IEEE Micro
The Effect of Code Reordering on Branch Prediction
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Microbenchmarks for determining branch predictor organization
Software—Practice & Experience - Research Articles
The Camino Compiler infrastructure
ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Clustered indexing for branch predictors
Microprocessors & Microsystems
HitME: low power Hit MEmory buffer for embedded systems
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Blind Optimization for Exploiting Hardware Features
CC '09 Proceedings of the 18th International Conference on Compiler Construction: Held as Part of the Joint European Conferences on Theory and Practice of Software, ETAPS 2009
Studying microarchitectural structures with object code reordering
Proceedings of the Workshop on Binary Instrumentation and Applications
Compiler techniques to improve dynamic branch prediction for indirect jump and call instructions
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Runtime adaptation: a case for reactive code alignment
Proceedings of the 2nd International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Combining code reordering and cache configuration
ACM Transactions on Embedded Computing Systems (TECS)
STABILIZER: statistically sound performance evaluation
Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
A proper performance evaluation system that summarizes code placement effects
Proceedings of the 11th ACM SIGPLAN-SIGSOFT Workshop on Program Analysis for Software Tools and Engineering
Hi-index | 0.00 |
Code placement techniques have traditionally improved instruction fetch bandwidth by increasing instruction locality and decreasing the number of taken branches. However, traditional code placement techniques have less benefit in the presence of a trace cache that alters the placement of instructions in the instruction cache. Moreover, as pipelines have become deeper to accommodate increasing clock rates, branch misprediction penalties have become a significant impediment to performance. We evaluate pattern history table partitioning, a feedback directed code placement technique that explicitly places conditional branches so that they are less likely to interfere destructively with one another in branch prediction tables. On SPEC CPU benchmarks running on an Intel Pentium 4, branch mispredictions are reduced by up to 22% and 3.5% on average. This reduction yields a speedup of up to 16.0% and 4.5% on average. By contrast, branch alignment, a previous code placement technique, yields only up to a 4.7% speedup and less than 1% on average.