Reducing the cost of branches

Authors:
S. McFarling;J. Hennesey
Affiliations:
Computer Systems Laboratory, Stanford University;Computer Systems Laboratory, Stanford University
Venue:
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Year:
1986

Citing 7
Cited 106

Reduced instruction set computers

Communications of the ACM - Special section on computer architecture
Postpass Code Optimization of Pipeline Constraints

ACM Transactions on Programming Languages and Systems (TOPLAS)
Hardware/software tradeoffs for increased performance

ASPLOS I Proceedings of the first international symposium on Architectural support for programming languages and operating systems
A Characterization of Processor Performance in the vax-11/780

ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
A portable machine-independent global optimizer--design and measurements

A portable machine-independent global optimizer--design and measurements
Code optimization of pipeline constraints

Code optimization of pipeline constraints
Reduced instruction set computer architectures for vlsi (microprocessor, risc, multiple-windows - of - registers)

Reduced instruction set computer architectures for vlsi (microprocessor, risc, multiple-windows - of - registers)

Branch folding in the CRISP microprocessor: reducing branch delay to zero

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An evaluation of branch architectures

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Checkpoint repair for out-of-order execution machines

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
WISQ: a restartable architecture using queues

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Architectural tradeoffs in the design of MIPS-X

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Checkpoint repair for high-performance out-of-order execution machines

IEEE Transactions on Computers
Measurement and evaluation of the MIPS architecture and processor

ACM Transactions on Computer Systems (TOCS)
Reducing the Branch Penalty in Pipelined Processors

Computer
A novel effective address calculation mechanism for RISC microprocessors

ACM SIGARCH Computer Architecture News - Special Issue: Architectural Support for Operating Systems
Operation scheduling in reconfigurable, multifunction pipelines

ACM SIGMICRO Newsletter
Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Comparing software and hardware schemes for reducing the cost of branches

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Architectural and organizational tradeoffs in the design of the MultiTitan CPU

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
An architecture framework for application-specific and scalable architectures

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Forward semantic: a compiler-assisted instruction fetch method for heavily pipelined processors

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
A flexible VLSI core for an adaptable architecture

MICRO 22 Proceedings of the 22nd annual workshop on Microprogramming and microarchitecture
Instruction Issue Logic for High-Performance, Interruptible, Multiple Functional Unit, Pipelined Computers

IEEE Transactions on Computers
Reducing the branch penalty by rearranging instructions in a double-width memory

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Limits of instruction-level parallelism

ASPLOS IV Proceedings of the fourth international conference on Architectural support for programming languages and operating systems
Predicting program behavior using real or estimated profiles

PLDI '91 Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation
OHMEGA: a VLSI superscalar processor architecture for numerical applications

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
An empirical study of the CRAY Y-MP processor using the Perfect club benchmarks

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
IMPACT: an architectural framework for multiple-instruction-issue processors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A parallel pipelined processor with conditional instruction execution

ACM SIGARCH Computer Architecture News - Symposium on parallel algorithms and architectures
The effect of employing advanced branching mechanisms in superscalar processors

ACM SIGARCH Computer Architecture News
Exploiting multi-way branching to boost superscalar processor performance

ACM SIGPLAN Notices
DSNS (dynamically-hazard-resolved statically-code-scheduled, nonuniform superscalar): yet another superscalar processor architecture

ACM SIGARCH Computer Architecture News
Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Branch Strategies: Modeling and Optimization (Pipeline Processing)

IEEE Transactions on Computers
Limits of control flow on parallelism

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance evaluation of a decoded instruction cache for variable instruction-length computers

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Performance optimization of pipelined primary cache

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Instruction-level parallelism from execution interlock collapsing

ACM SIGARCH Computer Architecture News
Improving the accuracy of dynamic branch prediction using branch correlation

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Predicting conditional branch directions from previous runs of a program

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Improving instruction supply efficiency in superscalar architectures using instruction trace buffers

SAC '92 Proceedings of the 1992 ACM/SIGAPP Symposium on Applied computing: technological challenges of the 1990's
Branch merging for effective exploitation of instruction-level parallelism

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Y-Pipe: a conditional branching scheme without pipeline delays

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
An out-of-order superscalar processor with speculative execution and fast, precise interrupts

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Evaluation of A+B=K Conditions Without Carry Propagation

IEEE Transactions on Computers
Branch prediction for free

PLDI '93 Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation
Reducing indirect function call overhead in C++ programs

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving semi-static branch prediction by code replication

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The impact of unresolved branches on branch prediction scheme performance

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Branch with masked squashing in superpipelined processors

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Branch classification: a new mechanism for improving branch predictor performance

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The effect of speculatively updating branch history on branch prediction accuracy, revisited

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Improving the accuracy of static branch prediction using branch correlation

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Accurate static branch prediction by value range propagation

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
Corpus-based static branch prediction

PLDI '95 Proceedings of the ACM SIGPLAN 1995 conference on Programming language design and implementation
A comparative analysis of schemes for correlated branch prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Next cache line and set prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction cache fetch policies for speculative execution

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Direct-mapped versus set-associative pipelined caches

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
The influence of branch prediction table interference on branch prediction scheme performance

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Performance issues in correlated branch prediction schemes

Proceedings of the 28th annual international symposium on Microarchitecture
Partial resolution in branch target buffers

Proceedings of the 28th annual international symposium on Microarchitecture
A system level perspective on branch architecture performance

Proceedings of the 28th annual international symposium on Microarchitecture
Alternative implementations of hybrid branch predictors

Proceedings of the 28th annual international symposium on Microarchitecture
An analysis of dynamic branch prediction schemes on system workloads

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Correlation and aliasing in dynamic branch predictors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Performance comparison of ILP machines with cycle time evaluation

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Evidence-based static branch prediction using machine learning

ACM Transactions on Programming Languages and Systems (TOPLAS)
Accurate and practical profile-driven compilation using the profile buffer

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Hardware implementation of a general multi-way jump mechanism

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
An instruction reoderer for pipelined computers

MICRO 23 Proceedings of the 23rd annual workshop and symposium on Microprogramming and microarchitecture
Prophetic branches: a branch architecture for code compaction and efficient execution

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
MIDEE: smoothing branch and instruction cache miss penalties on deep pipelines

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Near-optimal intraprocedural branch alignment

Proceedings of the ACM SIGPLAN 1997 conference on Programming language design and implementation
Target prediction for indirect jumps

Proceedings of the 24th annual international symposium on Computer architecture
The agree predictor: a mechanism for reducing negative branch history interference

Proceedings of the 24th annual international symposium on Computer architecture
Partial Resolution in Branch Target Buffers

IEEE Transactions on Computers
IMPACT: an architectural framework for multiple-instruction-issue processors

25 years of the international symposia on Computer architecture (selected papers)
Alternative implementations of two-level adaptive branch prediction

25 years of the international symposia on Computer architecture (selected papers)
Compact and efficient presentation conversion code

IEEE/ACM Transactions on Networking (TON)
Using value prediction to increase the power of speculative execution hardware

ACM Transactions on Computer Systems (TOCS)
A Practical Methodology for the Formal Verification of RISC Processors

Formal Methods in System Design
Walk-Time Address Adjustment for Improving the Accuracy of Dynamic Branch Prediction

IEEE Transactions on Computers
Control flow optimization for supercomputer scalar processing

ICS '89 Proceedings of the 3rd international conference on Supercomputing
Instruction fetch unit for parallel execution of branch instructions

ICS '89 Proceedings of the 3rd international conference on Supercomputing
LISP on a reduced-instruction-set-processor

LFP '86 Proceedings of the 1986 ACM conference on LISP and functional programming
Performance comparison of load/store and symmetric instruction set architectures

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Reducing the cost of branches by using registers

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Boosting beyond static scheduling in a superscalar processor

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Software profiling for hot path prediction: less is more

ACM SIGPLAN Notices
Software profiling for hot path prediction: less is more

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
The Gmicro/100 32-Bit Microprocessor

IEEE Micro
Efficient Instruction Sequencing with Inline Target Insertion

IEEE Transactions on Computers
Reducing Branch Delay to Zero in Pipelined Processors

IEEE Transactions on Computers
Branch Target Buffer Design and Optimization

IEEE Transactions on Computers
Performance Evaluation of a Decoded Instruction Cache for Variable Instruction Length Computers

IEEE Transactions on Computers
Optimal 2-Bit Branch Predictors

IEEE Transactions on Computers
The Performance of Counter- and Correlation-Based Schemes for Branch Target Buffers

IEEE Transactions on Computers
Branch Prediction Using Profile Data

Euro-Par '01 Proceedings of the 7th International Euro-Par Conference Manchester on Parallel Processing
Reality-based optimization

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Implicit Signature Checking

FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Predicting program behavior using real or estimated profiles

ACM SIGPLAN Notices - Best of PLDI 1979-1999
Improving WCET by applying a WC code-positioning optimization

ACM Transactions on Architecture and Code Optimization (TACO)
Post Register Allocation Spill Code Optimization

Proceedings of the International Symposium on Code Generation and Optimization
Reducing the cost of conditional transfers of control by using comparison specifications

Proceedings of the 2006 ACM SIGPLAN/SIGBED conference on Language, compilers, and tool support for embedded systems
High-level power analysis for multi-core chips

CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
TRICK: tracking and reusing compiler's knowledge

ACM SIGPLAN Notices

Quantified Score

Hi-index	0.04

Visualization

Abstract

Pipelining is the major organizational technique that computers use to reach higher single-processor performance. A fundamental disadvantage of pipelining is the loss incurred due to branches that require stalling or flushing the pipeline. Both hardware solutions and architectural changes have been proposed to overcome these problems. This paper examines a range of schemes for reducing branch cost focusing on both static (compile-time) and dynamic (hardware-assisted) prediction of branches. These schemes are investigated from quantitative performance and implementation viewpoints.1