Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Simultaneous multithreading: maximizing on-chip parallelism
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Disjoint eager execution: an optimal form of speculative execution
Proceedings of the 28th annual international symposium on Microarchitecture
Integrating a misprediction recovery cache (MRC) into a superscalar pipeline
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Assigning confidence to conditional branch predictions
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Confidence estimation for speculation control
Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution
Proceedings of the 25th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture
Proceedings of the 25th annual international symposium on Computer architecture
Instruction fetch mechanisms for multipath execution processors
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 14th international conference on Supercomputing
The impact of delay on the design of branch predictors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Confidence Estimation for Branch Prediction Reversal
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Branch Prediction Using Selective Branch Inversion
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Control-Flow Speculation through Value Prediction for Superscalar Processors
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21264 Microprocessor Architecture
ICCD '98 Proceedings of the International Conference on Computer Design
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Improving Branch Prediction Accuracy by Reducing Pattern History Table Interference
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
Power-Aware Control Speculation through Selective Throttling
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Fast branch misprediction recovery in out-of-order superscalar processors
Proceedings of the 19th annual international conference on Supercomputing
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors
IEEE Transactions on Computers
PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Hiding the misprediction penalty of a resource-efficient high-performance processor
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing branch misprediction penalties via adaptive pipeline scaling
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Turbo-ROB: a low cost checkpoint/restore accelerator
HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
RIMP: runtime implicit predication
APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies
Hi-index | 0.00 |
The reasons for performance losses due to conditional branch mispredictions are first studied. Branch misprediction penalties are broken into three categories: pipeline-fill penalty, window-fill penalty, and serialization penalty. The first and third of these produce most of the performance loss, but the second is also significant. Previously proposed dual (or multi) path execution methods attempt to reduce all three penalties, but these methods are also quite complex. Most of the complexity is caused by simultaneously executing instructions from multiple paths.A good engineering compromise is to avoid the complexity of multiple path execution by focusing on methods that reduce only the pipeline and window re-fill penalties. Dual Path Instruction Processing (DPIP) is proposed as a simple mechanism that fetches, decodes, and renames, but does not execute, instructions from the alternative path for low confidence predicted branches at the same time as the predicted path is being executed. All the stages of the pipeline front-end are hidden once the misprediction is detected. This method thus targets the pipeline-fill penalty and is shown to achieve a good trade-off between performance and complexity. To reduce the window-fill penalty, we further propose the addition of a pre-scheduling engine that schedules instructions from the alternative path in an estimated execution order. Thus, after a misprediction, a high number of instructions from the alternate path can be immediately issued to execution, achieving an effect similar to very fast re-filling of the window. Performance evaluation of DPIP in a 14-stage superscalar processor (like IBM Power 4) shows an average IPC improvement of up to 10% for the bzip2 benchmark, and an average of 8% for ten benchmarks from the SPECint95 and SPECint2000 suites.