Alternative implementations of two-level adaptive branch prediction

Authors:
Tse-Yu Yeh;Yale N. Patt
Affiliations:
-;-
Venue:
ISCA '92 Proceedings of the 19th annual international symposium on Computer architecture
Year:
1992

Citing 14
Cited 143

Reducing the cost of branches

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Branch folding in the CRISP microprocessor: reducing branch delay to zero

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An evaluation of branch architectures

ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Characterization of branch and data dependencies on programs for evaluating pipeline performance

IEEE Transactions on Computers
Checkpoint repair for high-performance out-of-order execution machines

IEEE Transactions on Computers
Reducing the Branch Penalty in Pipelined Processors

Computer
Available instruction-level parallelism for superscalar and superpipelined machines

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Comparing software and hardware schemes for reducing the cost of branches

ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Branch history table prediction of moving target branches due to subroutine returns

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Single instruction stream parallelism is greater than two

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Optimizing delayed branches

MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
A study of branch prediction strategies

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
RISC I: A Reduced Instruction Set VLSI Computer

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture

A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Toward zero-cost branches using instruction registers

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Extraction of massive instruction level parallelism

ACM SIGARCH Computer Architecture News
A comparison of dynamic branch predictors that use two levels of branch history

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Reducing indirect function call overhead in C++ programs

POPL '94 Proceedings of the 21st ACM SIGPLAN-SIGACT symposium on Principles of programming languages
Improving semi-static branch prediction by code replication

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
The impact of unresolved branches on branch prediction scheme performance

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Branch classification: a new mechanism for improving branch predictor performance

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A fill-unit approach to multiple instruction issue

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
The effect of speculatively updating branch history on branch prediction accuracy, revisited

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Next cache line and set prediction

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction cache fetch policies for speculative execution

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Direct-mapped versus set-associative pipelined caches

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
Using predicated execution to improve the performance of a dynamically scheduled machine with speculative execution

PACT '95 Proceedings of the IFIP WG10.3 working conference on Parallel architectures and compilation techniques
A modified approach to data cache management

Proceedings of the 28th annual international symposium on Microarchitecture
Partial resolution in branch target buffers

Proceedings of the 28th annual international symposium on Microarchitecture
A system level perspective on branch architecture performance

Proceedings of the 28th annual international symposium on Microarchitecture
Alternative implementations of hybrid branch predictors

Proceedings of the 28th annual international symposium on Microarchitecture
Control flow prediction with tree-like subgraphs for superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
ARB: A Hardware Mechanism for Dynamic Reordering of Memory References

IEEE Transactions on Computers
Using hybrid branch predictors to improve branch prediction accuracy in the presence of context switches

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Correlation and aliasing in dynamic branch predictors

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Exploiting choice: instruction fetch and issue on an implementable simultaneous multithreading processor

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Analysis of branch prediction via data compression

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Compiler synthesized dynamic branch prediction

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
MIDEE: smoothing branch and instruction cache miss penalties on deep pipelines

MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Low power data processing by elimination of redundant computations

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Implementation and analysis of path history in dynamic branch prediction schemes

ICS '97 Proceedings of the 11th international conference on Supercomputing
Trading conflict and capacity aliasing in conditional branch predictors

Proceedings of the 24th annual international symposium on Computer architecture
A language for describing predictors and its application to automatic synthesis

Proceedings of the 24th annual international symposium on Computer architecture
The bi-mode branch predictor

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Reducing the performance impact of instruction cache misses by writing instructions into the reservation stations out-of-order

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Can program profiling support value prediction?

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Partial Resolution in Branch Target Buffers

IEEE Transactions on Computers
Kin: a high performance asynchronous processor architecture

ICS '98 Proceedings of the 12th international conference on Supercomputing
The effect of instruction fetch bandwidth on value prediction

Proceedings of the 25th annual international symposium on Computer architecture
Execution characteristics of desktop applications on Windows NT

Proceedings of the 25th annual international symposium on Computer architecture
An analysis of correlation and predictability: what makes two-level branch predictors work

Proceedings of the 25th annual international symposium on Computer architecture
Branch prediction based on universal data compression algorithms

Proceedings of the 25th annual international symposium on Computer architecture
Confidence estimation for speculation control

Proceedings of the 25th annual international symposium on Computer architecture
Dynamic history-length fitting: a third level of adaptivity for branch prediction

Proceedings of the 25th annual international symposium on Computer architecture
Using prediction to accelerate coherence protocols

Proceedings of the 25th annual international symposium on Computer architecture
Threaded multiple path execution

Proceedings of the 25th annual international symposium on Computer architecture
Retrospective: alternative implementations of two-level adaptive training branch prediction

25 years of the international symposia on Computer architecture (selected papers)
Using value prediction to increase the power of speculative execution hardware

ACM Transactions on Computer Systems (TOCS)
Analyzing the working set characteristics of branch execution

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Load latency tolerance in dynamically scheduled processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Predicting indirect branches via data compression

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Variable length path branch prediction

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Performance of database workloads on shared-memory systems with out-of-order processors

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Memory sharing predictor: the key to a speculative coherent DSM

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Simultaneous subordinate microthreading (SSMT)

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
On the use of trace sampling for architectural studies of desktop applications

SIGMETRICS '99 Proceedings of the 1999 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Control Flow Prediction Schemes for Wide-Issue Superscalar Processors

IEEE Transactions on Parallel and Distributed Systems
Increasing effective IPC by exploiting distant parallelism

ICS '99 Proceedings of the 13th international conference on Supercomputing
Dynamic removal of redundant computations

ICS '99 Proceedings of the 13th international conference on Supercomputing
Using dynamic cache management techniques to reduce energy in a high-performance processor

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Completion time multiple branch prediction for enhancing trace cache performance

Proceedings of the 27th annual international symposium on Computer architecture
Architecture of the Atlas Chip-Multiprocessor: Dynamically Parallelizing Irregular Applications

IEEE Transactions on Computers
Hardware prediction for data coherency of scientific codes on DSM

Proceedings of the 2000 ACM/IEEE conference on Supercomputing
Design tradeoffs for the Alpha EV8 conditional branch predictor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Timing analysis of embedded software for speculative processors

Proceedings of the 15th international symposium on System Synthesis
NetBench: a benchmarking suite for network processors

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
The Misprediction Recovery Cache

International Journal of Parallel Programming
Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures

International Journal of Parallel Programming
Selective Branch Inversion: Confidence Estimation for Branch Predictors

International Journal of Parallel Programming
Access-Mode Predictions for Low-Power Cache Design

IEEE Micro
Branch Target Buffer Design and Optimization

IEEE Transactions on Computers
Effective Hardware-Based Data Prefetching for High-Performance Processors

IEEE Transactions on Computers
Optimal 2-Bit Branch Predictors

IEEE Transactions on Computers
The Performance of Counter- and Correlation-Based Schemes for Branch Target Buffers

IEEE Transactions on Computers
Operational Data Analysis: Improved Predictions Using Multi-computer Pattern Detection

DSOM '00 Proceedings of the 11th IFIP/IEEE International Workshop on Distributed Systems: Operations and Management: Services Management in Intelligent Networks
Using Dataflow Based Contextfor Accurate Branch Prediction

HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Multiscalar Execution along a Single Flow of Control

ICPP '97 Proceedings of the international Conference on Parallel Processing
Cached Two-Level Adaptive Branch Predictors with Multiple Stages

ARCS '02 Proceedings of the International Conference on Architecture of Computing Systems: Trends in Network and Pervasive Computing
Quantifying behavioral differences between multimedia and general-purpose workloads

Journal of Systems Architecture: the EUROMICRO Journal
Exploiting data-width locality to increase superscalar execution bandwidth

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Accurate timing analysis by modeling caches, speculation and their interaction

Proceedings of the 40th annual Design Automation Conference
Itanium 2 Processor Microarchitecture

IEEE Micro
How Useful Are Non-Blocking Loads, Stream Buffers and Speculative Execution in Multiple Issue Processors?

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Dynamic metrics for java

OOPSLA '03 Proceedings of the 18th annual ACM SIGPLAN conference on Object-oriented programing, systems, languages, and applications
Two-level branch prediction using neural networks

Journal of Systems Architecture: the EUROMICRO Journal - Special issue: Synthesis and verification
Evaluation and choice of various branch predictors for low-power embedded processor

Journal of Computer Science and Technology
Design and Optimization of Large Size and Low Overhead Off-Chip Caches

IEEE Transactions on Computers
Prophet/Critic Hybrid Branch Prediction

Proceedings of the 31st annual international symposium on Computer architecture
A Complexity-Effective Approach to ALU Bandwidth Enhancement for Instruction-Level Temporal Redundancy

Proceedings of the 31st annual international symposium on Computer architecture
An Efficient Value Predictor Dynamically Using Loop and Locality Properties

The Journal of Supercomputing
Decode filter cache for energy efficient instruction cache hierarchy in super scalar architectures

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Software Trace Cache

IEEE Transactions on Computers
Modeling control speculation for timing analysis

Real-Time Systems
Wrong Path Events: Exploiting Unusual and Illegal Program Behavior for Early Misprediction Detection and Recovery

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Better Branch Prediction Through Prophet/Critic Hybrids

IEEE Micro
Understanding the effects of wrong-path memory references on processor performance

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Improving branch prediction accuracy with parallel conservative correctors

Proceedings of the 2nd conference on Computing frontiers
Energy-aware fetch mechanism: trace cache and BTB customization

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Fast branch misprediction recovery in out-of-order superscalar processors

Proceedings of the 19th annual international conference on Supercomputing
An Analysis of the Performance Impact of Wrong-Path Memory References on Out-of-Order and Runahead Execution Processors

IEEE Transactions on Computers
Wish Branches: Combining Conditional Branching and Predication for Adaptive Predicated Execution

Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Branch predictor design and performance estimation for a high performance embedded microprocessor

ASP-DAC '03 Proceedings of the 2003 Asia and South Pacific Design Automation Conference
Simple penalty-sensitive replacement policies for caches

Proceedings of the 3rd conference on Computing frontiers
Dynamic feature selection for hardware prediction

Journal of Systems Architecture: the EUROMICRO Journal
Using the first-level caches as filters to reduce the pollution caused by speculative memory references

International Journal of Parallel Programming
Evaluating Network Processors using NetBench

ACM Transactions on Embedded Computing Systems (TECS)
Modeling out-of-order processors for WCET analysis

Real-Time Systems
SlicK: slice-based locality exploitation for efficient redundant multithreading

Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Reducing Cache Pollution via Dynamic Data Prefetch Filtering

IEEE Transactions on Computers
Live, Runtime Phase Monitoring and Prediction on Real Systems with Application to Dynamic Power Management

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic per-branch history length adjustment to improve branch prediction accuracy

Microprocessors & Microsystems
Speculative supplier identification for reducing power of interconnects in snoopy cache coherence protocols

Proceedings of the 4th international conference on Computing frontiers
Visual simulator for ILP dynamic OOO processor

WCAE '04 Proceedings of the 2004 workshop on Computer architecture education: held in conjunction with the 31st International Symposium on Computer Architecture
An approach to reduce thread switch frequency for branch

DNCOCO'08 Proceedings of the 7th conference on Data networks, communications, computers
Performance Characterization of Itanium® 2-Based Montecito Processor

Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
Generation, Validation and Analysis of SPEC CPU2006 Simulation Points Based on Branch, Memory and TLB Characteristics

Proceedings of the 2009 SPEC Benchmark Workshop on Computer Performance Evaluation and Benchmarking
Phantom-BTB: a virtualized branch target buffer design

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Adaptive Read Validation in Time-Based Software Transactional Memory

Euro-Par 2008 Workshops - Parallel Processing
The impact of speculative execution on SMT processors

International Journal of Parallel Programming
Branch Predictor Warmup for Sampled Simulation through Branch History Matching

Transactions on High-Performance Embedded Architectures and Compilers II
Predictive algorithms in the management of computer systems

IBM Systems Journal
Saturating counter design for meta predictor in hybrid branch prediction

CSECS'09 Proceedings of the 8th WSEAS International Conference on Circuits, systems, electronics, control & signal processing
Trace Cache Miss Rate

International Journal of Modelling and Simulation
Branch history matching: branch predictor warmup for sampled simulation

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
EXACT: explicit dynamic-branch prediction with active updates

Proceedings of the 7th ACM international conference on Computing frontiers
Impact analysis of performance faults in modern microprocessors

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Dynamic branch prediction and control speculation

International Journal of High Performance Systems Architecture
NTPT: on the end-to-end traffic prediction in the on-chip networks

Proceedings of the 47th Design Automation Conference
A novel meta predictor design for hybrid branch prediction

WSEAS Transactions on Computers
An adaptive cache coherence protocol for chip multiprocessors

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Quantifying and reducing the effects of wrong-path memory references in cache-coherent multiprocessor systems

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Write invalidation analysis in chip multiprocessors

PATMOS'09 Proceedings of the 19th international conference on Integrated Circuit and System Design: power and Timing Modeling, Optimization and Simulation
Branch strategies to optimize decision trees for wide-issue architectures

LCPC'04 Proceedings of the 17th international conference on Languages and Compilers for High Performance Computing
AGC: adaptive global clock in software transactional memory

Proceedings of the 2012 International Workshop on Programming Models and Applications for Multicores and Manycores
Tradeoffs between branch mispredictions and comparisons for sorting algorithms

WADS'05 Proceedings of the 9th international conference on Algorithms and Data Structures
Exploiting intra-function correlation with the global history stack

SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Design space exploration of hybrid ultra low power branch predictors

ARCS'12 Proceedings of the 25th international conference on Architecture of Computing Systems
Maintaining consistency in software transactional memory through dynamic versioning tuning

ICA3PP'12 Proceedings of the 12th international conference on Algorithms and Architectures for Parallel Processing - Volume Part II
Improving performance of software transactional memory through contention locality

The Journal of Supercomputing
On the Impact of Performance Faults in Modern Microprocessors

Journal of Electronic Testing: Theory and Applications
Bandwidth Adaptive Cache Coherence Optimizations for Chip Multiprocessors

International Journal of Parallel Programming

Quantified Score

Hi-index	0.03

Visualization

Abstract

As the issue rate and depth of pipelining of high performance Superscalar processors increase, the importance of an excellent branch predictor becomes more vital to delivering the potential performance of a wide-issue, deep pipelined microarchitecture. We propose a new dynamic branch predictor (Two-Level Adaptive Branch Prediction) that achieves substantially higher accuracy than any other scheme reported in the literature. The mechanism uses two levels of branch history information to make predictions, the history of the last k branches encountered, and the branch behavior for the last s occurrences of the specific pattern of these k branches. We have identified three variations of the Two-Level Adaptive Branch Prediction, depending on how finely we resolve the history information gathered. We compute the hardware costs of implementing each of the three variations, and use these costs in evaluating their relative effectiveness. We measure the branch prediction accuracy of the three variations of two-Level Adaptive Branch Prediction, along with several other popular proposed dynamic and static prediction schemes, on the SPEC benchmarks. We show that the average prediction accuracy for Two-Level Adaptive Branch Prediction is 97 percent, while the other known schemes achieve at most 94.4 percent average prediction accuracy. We measure the effectiveness of different prediction algorithms and different amounts of history and pattern information. We measure the costs of each variation to obtain the same prediction accuracy.