Dual path instruction processing

Authors:
Juan L. Aragón;José González;Antonio González;James E. Smith
Affiliations:
Universidad de Murcia, Murcia (Spain);Universidad de Murcia, Murcia (Spain);Universitat Politècnica de Catalunya, Barcelona (Spain);University of Wisconsin-Madison, Madison, WI
Venue:
ICS '02 Proceedings of the 16th international conference on Supercomputing
Year:
2002

Citing 19
Cited 10

Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Disjoint eager execution: an optimal form of speculative execution

Proceedings of the 28th annual international symposium on Microarchitecture
Integrating a misprediction recovery cache (MRC) into a superscalar pipeline

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Assigning confidence to conditional branch predictions

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The bi-mode branch predictor

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Confidence estimation for speculation control

Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution

Proceedings of the 25th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture

Proceedings of the 25th annual international symposium on Computer architecture
Instruction fetch mechanisms for multipath execution processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A low-complexity issue logic

Proceedings of the 14th international conference on Supercomputing
The impact of delay on the design of branch predictors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Confidence Estimation for Branch Prediction Reversal

HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Branch Prediction Using Selective Branch Inversion

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Control-Flow Speculation through Value Prediction for Superscalar Processors

PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
The Alpha 21264 Microprocessor Architecture

ICCD '98 Proceedings of the International Conference on Computer Design
Data-Flow Prescheduling for Large Instruction Windows in Out-of-Order Processors

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Improving Branch Prediction Accuracy by Reducing Pattern History Table Interference

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques

Power-Aware Control Speculation through Selective Throttling

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Runahead Execution: An Alternative to Very Large Instruction Windows for Out-of-Order Processors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Fast branch misprediction recovery in out-of-order superscalar processors

Proceedings of the 19th annual international conference on Supercomputing
Control Speculation for Energy-Efficient Next-Generation Superscalar Processors

IEEE Transactions on Computers
PathExpander: Architectural Support for Increasing the Path Coverage of Dynamic Bug Detection

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Hiding the misprediction penalty of a resource-efficient high-performance processor

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing branch misprediction penalties via adaptive pipeline scaling

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Turbo-ROB: a low cost checkpoint/restore accelerator

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
RIMP: runtime implicit predication

APPT'05 Proceedings of the 6th international conference on Advanced Parallel Processing Technologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

The reasons for performance losses due to conditional branch mispredictions are first studied. Branch misprediction penalties are broken into three categories: pipeline-fill penalty, window-fill penalty, and serialization penalty. The first and third of these produce most of the performance loss, but the second is also significant. Previously proposed dual (or multi) path execution methods attempt to reduce all three penalties, but these methods are also quite complex. Most of the complexity is caused by simultaneously executing instructions from multiple paths.A good engineering compromise is to avoid the complexity of multiple path execution by focusing on methods that reduce only the pipeline and window re-fill penalties. Dual Path Instruction Processing (DPIP) is proposed as a simple mechanism that fetches, decodes, and renames, but does not execute, instructions from the alternative path for low confidence predicted branches at the same time as the predicted path is being executed. All the stages of the pipeline front-end are hidden once the misprediction is detected. This method thus targets the pipeline-fill penalty and is shown to achieve a good trade-off between performance and complexity. To reduce the window-fill penalty, we further propose the addition of a pre-scheduling engine that schedules instructions from the alternative path in an estimated execution order. Thus, after a misprediction, a high number of instructions from the alternate path can be immediately issued to execution, achieving an effect similar to very fast re-filling of the window. Performance evaluation of DPIP in a 14-stage superscalar processor (like IBM Power 4) shows an average IPC improvement of up to 10% for the bzip2 benchmark, and an average of 8% for ten benchmarks from the SPECint95 and SPECint2000 suites.