Highly concurrent scalar processing
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
Comparing software and hardware schemes for reducing the cost of branches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
The Evolution of Instruction Sequencing
Computer - Special issue on instruction sequencing
Reducing the branch penalty by rearranging instructions in a double-width memory
ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Branch history table prediction of moving target branches due to subroutine returns
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
ACM SIGARCH Computer Architecture News
Two-level adaptive training branch prediction
MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Eliminating Interlocks in Deeply Pipelined Processors by Delay Enforced Multistreaming
IEEE Transactions on Computers
Using Horizontal Prefetching to Circumvent the Jump Problem
IEEE Transactions on Computers
Performance evaluation of a decoded instruction cache for variable instruction-length computers
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance optimization of pipelined primary cache
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Y-Pipe: a conditional branching scheme without pipeline delays
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Toward zero-cost branches using instruction registers
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Reducing indirect function call overhead in C++ programs
POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast and accurate instruction fetch and branch prediction
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Improving the accuracy of static branch prediction using branch correlation
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
The influence of branch prediction table interference on branch prediction scheme performance
PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Improving the Accuracy of History-Based Branch Prediction
IEEE Transactions on Computers
A basic architecture supporting LGDG computation
ICS '90 Proceedings of the 4th international conference on Supercomputing
Optimization on instruction reorganization
MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Multilevel Optimization of Pipelined Caches
IEEE Transactions on Computers
Alternative implementations of two-level adaptive branch prediction
25 years of the international symposia on Computer architecture (selected papers)
A Comparison of RISC Architectures
IEEE Micro
The Gmicro/100 32-Bit Microprocessor
IEEE Micro
Reducing Branch Delay to Zero in Pipelined Processors
IEEE Transactions on Computers
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
Performance Evaluation of a Decoded Instruction Cache for Variable Instruction Length Computers
IEEE Transactions on Computers
Optimal 2-Bit Branch Predictors
IEEE Transactions on Computers
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach
Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Profile-based dynamic pipeline scaling
The Journal of Supercomputing
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach
Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Compiler support for dynamic pipeline scaling
EUC'07 Proceedings of the 2007 international conference on Embedded and ubiquitous computing
Hi-index | 4.12 |
A probabilistic model is developed to quantify the performance effects of the branch penalty in a typical pipeline. The branch penalty is analyzed as a function of the relative number of branch instructions executed and the probability that a branch is taken. The resulting model shows the fraction of maximum performance achievable under the given conditions. Techniques to reduce the branch penalty include static and dynamic branch prediction, the branch target buffer, the delayed branch, branch bypassing and multiple prefetching, branch folding, resolution of branch decision early in the pipeline, using multiple independent instruction streams in a shared pipeline, and the prepare-to-branch instruction.