The Effect of Code Reordering on Branch Prediction

Authors:
Alex Ramirez;Josep L. Larriba-Pey;Mateo Valero
Affiliations:
-;-;-
Venue:
PACT '00 Proceedings of the 2000 International Conference on Parallel Architectures and Compilation Techniques
Year:
2000

Citing 0
Cited 6

Branch Prediction Using Profile Data

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Fetching instruction streams

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Hardware Support for Control Transfers in Code Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Software Trace Cache

IEEE Transactions on Computers
Code placement for improving dynamic branch prediction accuracy

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Combining code reordering and cache configuration

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Branch prediction accuracy is a very important factor for superscalar processor performance. The ability to predict the outcome of a branch allows the processor to effectively use a large instruction window, and extract a larger amount of Instruction Level Parallelism (ILP). In this paper, we will examine the effect of code layout optimizations on branch prediction accuracy and final processor performance. These code-reordering techniques align branches so that they tend to be not taken, achieving better instruction cache performance and increasing the fetch bandwidth. Here we focus on how these optimizations affect both static and dynamic branch prediction. Code reordering mainly increases the number of not taken branches, which benefits simple static predictors, which reach over 80% prediction accuracy with optimized codes. This branch direction change produces two effects on dynamic branch prediction: on the positive side, trades negative interference for neutral or positive interference in the prediction tables; on the negative side, it causes a worse distribution of the Branch History Register (BHR), causing many possible history values to be unused. Our results show that code reordering reduces negative Pattern History Table (PHT) interference, increasing branch prediction accuracy on small branch predictors. For example, a 0.5KB gshare improves from 91.4% to 93.6%, and a 0.4KB gskew predictor from 93.5% to 94.4%. For larger history lengths, the large amount of not taken branches can degrade predictor performance on dealiased schemes, like the 16KB agree predictor which goes from 96.2% to 95.8%. However, processor performance not only depends on branch prediction accuracy. Layout optimized codes have much better instruction cache performance, and wider fetch bandwidth. Our results show that when all three factors are considered together, code-reordering techniques always improve processor performance. For example, performance still increases by 8% with an agree predictor, which loses prediction accuracy, and it increases by 9% with a gshare predictor, which increases prediction accuracy.