Branch Target Buffer Design and Optimization

Authors:
C. H. Perleberg;A. J. Smith
Affiliations:
-;-
Venue:
IEEE Transactions on Computers
Year:
1993

Citing 30
Cited 44

Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Line (block) size choice for CPU cache memories

IEEE Transactions on Computers
Branch folding in the CRISP microprocessor: reducing branch delay to zero

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An evaluation of branch architectures

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Reducing the Branch Penalty in Pipelined Processors

Computer
MIPS RISC architecture

MIPS RISC architecture
The Clipper processor: instruction set architecture and implementation

Communications of the ACM
The IBM RISC System/6000 processor: hardware overview

IBM Journal of Research and Development
Branch history table prediction of moving target branches due to subroutine returns

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Branch Strategies: Modeling and Optimization (Pipeline Processing)

IEEE Transactions on Computers
Alternative implementations of two-level adaptive branch prediction

ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Improving the accuracy of dynamic branch prediction using branch correlation

ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Generation and analysis of very long address traces

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Cache evaluation and the impact of workload choice

ISCA '85 Proceedings of the 12th annual international symposium on Computer architecture
Pipeline Architecture

ACM Computing Surveys (CSUR)
The CRAY-1 computer system

Communications of the ACM - Special issue on computer architecture
MC68020 32-Bit Microprocessor User's Manual

MC68020 32-Bit Microprocessor User's Manual
The Architecture of Symbolic Computers

The Architecture of Symbolic Computers
Computer Architecture and Parallel Processing

Computer Architecture and Parallel Processing
Optimizing delayed branches

MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
A study of instruction cache organizations and replacement policies

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
Performance measurements on HEP - a pipelined MIMD computer

ISCA '83 Proceedings of the 10th annual international symposium on Computer architecture
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
RISC I: A Reduced Instruction Set VLSI Computer

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
The effect of instruction fetch strategies upon the performance of pipelined instruction units

ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
An instruction timing model of CPU performance

ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Strategies for branch target buffers

Strategies for branch target buffers
Branch Target Buffer Design

Branch Target Buffer Design
Aspects of Cache Memory and Instruction

Aspects of Cache Memory and Instruction

Reducing indirect function call overhead in C++ programs

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Next cache line and set prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction cache fetch policies for speculative execution

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Dynamic path-based branch correlation

Proceedings of the 28th annual international symposium on Microarchitecture
Partial resolution in branch target buffers

Proceedings of the 28th annual international symposium on Microarchitecture
A system level perspective on branch architecture performance

Proceedings of the 28th annual international symposium on Microarchitecture
Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
An analysis of dynamic branch prediction schemes on system workloads

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Improving the Accuracy of History-Based Branch Prediction

IEEE Transactions on Computers
MIDEE: smoothing branch and instruction cache miss penalties on deep pipelines

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Partial Resolution in Branch Target Buffers

IEEE Transactions on Computers
A scalable front-end architecture for fast instruction delivery

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Predicting the usefulness of a block result: a micro-architectural technique for high-performance low-power processors

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Improving BTB performance in the presence of DLLs

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Optimizations Enabled by a Decoupled Front-End Architecture

IEEE Transactions on Computers
Two cache lines prediction for a wide-issue micro-architecture

ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Understanding and improving operating system effects in control flow prediction

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The Performance of Counter- and Correlation-Based Schemes for Branch Target Buffers

IEEE Transactions on Computers
Loop Termination Prediction

ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Integrated I-cache Way Predictor and Branch Target Buffer to Reduce Energy Consumption

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
A Comprehensive Analysis of Indirect Branch Prediction

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Speeding Up Target Address Generation Using a Self-indexed FTB (Research Note)

Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Branch prediction techniques for low-power VLIW processors

Proceedings of the 13th ACM Great Lakes symposium on VLSI
Design and characterization of the Berkeley multimedia workload

Multimedia Systems
Speculating to reduce unnecessary power consumption

ACM Transactions on Embedded Computing Systems (TECS)
SEPAS: a highly accurate energy-efficient branch predictor

Proceedings of the 2004 international symposium on Low power electronics and design
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach

Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
The instruction register file micro-architecture

Future Generation Computer Systems - Special issue: Parallel computing technologies
Lazy BTB: reduce BTB energy consumption using dynamic profiling

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
OS-Aware Branch Prediction: Improving Microprocessor Control Flow Prediction for Operating Systems

IEEE Transactions on Computers
Evaluating the performance of dynamic branch prediction schemes with BPSim

WCAE-3 '97 Proceedings of the 1997 workshop on Computer architecture education
Pipeline spectroscopy

Proceedings of the 2007 workshop on Experimental computer science
Pipeline spectroscopy

ecs'07 Experimental computer science on Experimental computer science
Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers

Microprocessors & Microsystems
Phantom-BTB: a virtualized branch target buffer design

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Evaluation of branch-prediction methods on traces from commercial applications

IBM Journal of Research and Development
Reducing leakage power with BTB access prediction

Integration, the VLSI Journal
The instruction register file micro-architecture

Future Generation Computer Systems - Special issue: Parallel computing technologies
Low-power branch prediction techniques for VLIW architectures: a compiler-hints based approach

Integration, the VLSI Journal - Special issue: ACM great lakes symposium on VLSI
Real-time unobtrusive program execution trace compression using branch predictor events

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Power-aware branch logic: a hardware based technique for filtering access to branch logic

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Leveraging speculative architectures for runtime program validation

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	15.00

Visualization

Abstract

A branch target buffer (BTB) can reduce the performance penalty of branches in pipelined processors by predicting the path of the branch and caching information used by the branch. Two major issues in the design of BTBs that achieves maximum performance with a limited number of bits allocated to the BTB implementation are discussed. The first is BTB management. A method for discarding branches from the BTB is examined. This method discards the branch with the smallest expected value for improving performance; it outperforms the least recently used (LRU) strategy by a small margin, at the cost of additional complexity. The second issue is the question of what information to store in the BTB. A BTB entry can consist of one or more of the following: branch tag, prediction information, the branch target address, and instructions at the branch target. Various BTB designs, with one or more of these fields, are evaluated and compared.