Reducing the Branch Penalty in Pipelined Processors

Authors:
David J. Lilja
Affiliations:
-
Venue:
Computer
Year:
1988

Citing 4
Cited 35

Highly concurrent scalar processing

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Branch folding in the CRISP microprocessor: reducing branch delay to zero

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Optimizing delayed branches

MICRO 15 Proceedings of the 15th annual workshop on Microprogramming

Comparing software and hardware schemes for reducing the cost of branches

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The Evolution of Instruction Sequencing

Computer - Special issue on instruction sequencing
Reducing the branch penalty by rearranging instructions in a double-width memory

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Branch history table prediction of moving target branches due to subroutine returns

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
DSNS (dynamically-hazard-resolved statically-code-scheduled, nonuniform superscalar): yet another superscalar processor architecture

ACM SIGARCH Computer Architecture News
Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Eliminating Interlocks in Deeply Pipelined Processors by Delay Enforced Multistreaming

IEEE Transactions on Computers
Using Horizontal Prefetching to Circumvent the Jump Problem

IEEE Transactions on Computers
Performance evaluation of a decoded instruction cache for variable instruction-length computers

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance optimization of pipelined primary cache

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Y-Pipe: a conditional branching scheme without pipeline delays

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Toward zero-cost branches using instruction registers

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Reducing indirect function call overhead in C++ programs

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The influence of branch prediction table interference on branch prediction scheme performance

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Improving the Accuracy of History-Based Branch Prediction

IEEE Transactions on Computers
A basic architecture supporting LGDG computation

ICS '90 Proceedings of the 4th international conference on Supercomputing
Optimization on instruction reorganization

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Multilevel Optimization of Pipelined Caches

IEEE Transactions on Computers
Alternative implementations of two-level adaptive branch prediction

25 years of the international symposia on Computer architecture (selected papers)
A Digital Signal Processor with IEEE Floating-Point Arithmetic

IEEE Micro
A Comparison of RISC Architectures

IEEE Micro
The Gmicro/100 32-Bit Microprocessor

IEEE Micro
Organization of the Motorola 88110 Superscalar RISC Microprocessor

IEEE Micro
Reducing Branch Delay to Zero in Pipelined Processors

IEEE Transactions on Computers
Branch Target Buffer Design and Optimization

IEEE Transactions on Computers
Performance Evaluation of a Decoded Instruction Cache for Variable Instruction Length Computers

IEEE Transactions on Computers
Optimal 2-Bit Branch Predictors

IEEE Transactions on Computers
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach

Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Profile-based dynamic pipeline scaling

The Journal of Supercomputing
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach

Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Compiler support for dynamic pipeline scaling

EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing

Quantified Score

Hi-index	4.12

Visualization

Abstract

A probabilistic model is developed to quantify the performance effects of the branch penalty in a typical pipeline. The branch penalty is analyzed as a function of the relative number of branch instructions executed and the probability that a branch is taken. The resulting model shows the fraction of maximum performance achievable under the given conditions. Techniques to reduce the branch penalty include static and dynamic branch prediction, the branch target buffer, the delayed branch, branch bypassing and multiple prefetching, branch folding, resolution of branch decision early in the pipeline, using multiple independent instruction streams in a shared pipeline, and the prepare-to-branch instruction.