Reconsidering Complex Branch Predictors

Authors:
Daniel A. Jiménez
Affiliations:
-
Venue:
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Year:
2003

Citing 12
Cited 16

Two-level adaptive training branch prediction

MICRO 24 Proceedings of the 24th annual international symposium on Microarchitecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Multiple-block ahead branch predictors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trading conflict and capacity aliasing in conditional branch predictors

Proceedings of the 24th annual international symposium on Computer architecture
The cascaded predictor: economical and adaptive branch target prediction

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
The impact of delay on the design of branch predictors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design tradeoffs for the Alpha EV8 conditional branch predictor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Neural methods for dynamic branch prediction

ACM Transactions on Computer Systems (TOCS)
The Alpha 21264 Microprocessor

IEEE Micro
Improving branch prediction by understanding branch behavior

Improving branch prediction by understanding branch behavior

Effective ahead pipelining of instruction block address generation

Proceedings of the 30th annual international symposium on Computer architecture
Fast Path-Based Neural Branch Prediction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Prophet/Critic Hybrid Branch Prediction

Proceedings of the 31st annual international symposium on Computer architecture
Improved latency and accuracy for neural branch prediction

ACM Transactions on Computer Systems (TOCS)
Piecewise Linear Branch Prediction

Proceedings of the 32nd annual international symposium on Computer Architecture
Analysis of the O-GEometric History Length Branch Predictor

Proceedings of the 32nd annual international symposium on Computer Architecture
A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Evaluating trace cache energy efficiency

ACM Transactions on Architecture and Code Optimization (TACO)
Enlarging Instruction Streams

IEEE Transactions on Computers
A latency-conscious SMT branch prediction architecture

International Journal of High Performance Computing and Networking
Generalizing neural branch prediction

ACM Transactions on Architecture and Code Optimization (TACO)
Phantom-BTB: a virtualized branch target buffer design

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Architectural core salvaging in a multi-core processor for hard-error tolerance

Proceedings of the 36th annual international symposium on Computer architecture
Multiple stream prediction

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
A new case for the TAGE branch predictor

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A novel architecture for ahead branch prediction

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

To sustain instruction throughput rates in more aggressively clocked microarchitectures, microarchitects have incorporated larger and more complex branch predictors into their designs, taking advantage of the increasing numbers of transistors available on a chip. Unfortunately, because of penalties associated with their implementations, the extra accuracy provided by many branch predictors does not produce a proportionate increase in performance. Specifically, we show that the techniques used to hide the latency of a large and complex branch predictor do not scale well and will be unable to sustain IPC for deeper pipelines.We investigate a different way to build large branch predictors. We propose an alternative predictor design that completely hides predictor latency so that accuracy and hardware budget are the only factors that affect the efficiency of the predictor. Our simple design allows the predictor to be pipelined efficiently by avoiding difficulties introduced by complex predictors. Because this predictor eliminates the penalties associated with complex predictors,overall performance exceeds that of even the most accurate known branch predictors in the literature at large hardware budgets. We conclude that as chip densities increase in the next several years, the accuracy of complex branch predictors must be weighed against the performance benefits of simple branch predictors.