Highly concurrent scalar processing
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Effective compiler support for predicated execution using the hyperblock
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Guarded execution and branch prediction in dynamic ILP processors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Branch classification: a new mechanism for improving branch predictor performance
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The effects of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Characterizing the impact of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Software pipelining showdown: optimal vs. heuristic methods in a production compiler
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
A Study of Control Independence in Superscalar Processors
HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Dynamic Hammock Predication for Non-Predicated Instruction Set Architectures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
A New Framework for Integrated Global Local Scheduling
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
2D-Profiling: Detecting Input-Dependent Branches with a Single Input Data Set
Proceedings of the International Symposium on Code Generation and Optimization
Profile-assisted Compiler Support for Dynamic Predication in Diverge-Merge Processors
Proceedings of the International Symposium on Code Generation and Optimization
Hi-index | 0.00 |
Modern dynamically scheduled processors use branch prediction hardware to speculatively fetch and execute most likely executed paths in a program. Complex branch predictors have been proposed which attempt to identify these paths accurately such that the hardware can benefit from out-of-order (OOO) execution. Recent studies have shown that inspite of such complex prediction schemes, there still exist many frequently executed branches which are difficult to predict. Predicated execution has been proposed as an alternative technique to eliminate some of these branches in various forms ranging from a restrictive support to a full-blown support. We call the restrictive form of predicated execution as guarded execution.In this paper, we propose a new algorithm which uses profiling and selectively performs if-conversion for architectures with guarded execution support. Branch profiling is used to gather the taken, non-taken and misprediction counts for every branch. This combined with block profiling is used to select paths which suffer from heavy mispredictions and are profitable to if-convert. Effects of three different selection criterias, namely size-based, predictability-based and profiled-based, on net cycle improvements, branch mispredictions and mis-speculated instructions are then studied. We also propose new mechanisms to convert unsafe instructions to safe form to enhance the applicability of the technique. Finally, we explain numerous adjustments that were made to the selection criterias to better reflect the OOO processor behavior.