Effective ahead pipelining of instruction block address generation

Authors:
André Seznec;Antony Fraboulet
Affiliations:
IRISA/INRIA Rennes, France;IRISA/INRIA Rennes, France
Venue:
Proceedings of the 30th annual international symposium on Computer architecture
Year:
2003

Citing 13
Cited 17

Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Control flow prediction with tree-like subgraphs for superscalar processors

Proceedings of the 28th annual international symposium on Microarchitecture
Multiple-block ahead branch predictors

Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A scalable front-end architecture for fast instruction delivery

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
Code layout optimizations for transaction processing workloads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Design tradeoffs for the Alpha EV8 conditional branch predictor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors

International Journal of Parallel Programming
The Alpha 21264 Microprocessor

IEEE Micro
Multiple Branch and Block Prediction

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Reconsidering Complex Branch Predictors

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture

Fast Path-Based Neural Branch Prediction

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Prophet/Critic Hybrid Branch Prediction

Proceedings of the 31st annual international symposium on Computer architecture
Improved latency and accuracy for neural branch prediction

ACM Transactions on Computer Systems (TOCS)
Piecewise Linear Branch Prediction

Proceedings of the 32nd annual international symposium on Computer Architecture
Analysis of the O-GEometric History Length Branch Predictor

Proceedings of the 32nd annual international symposium on Computer Architecture
Merging path and gshare indexing in perceptron branch prediction

ACM Transactions on Architecture and Code Optimization (TACO)
A Simple Divide-and-Conquer Approach for Neural-Class Branch Prediction

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Block-aware instruction set architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Wide and efficient trace prediction using the local trace predictor

Proceedings of the 20th annual international conference on Supercomputing
Enlarging Instruction Streams

IEEE Transactions on Computers
A latency-conscious SMT branch prediction architecture

International Journal of High Performance Computing and Networking
Generalizing neural branch prediction

ACM Transactions on Architecture and Code Optimization (TACO)
Low-power, high-performance analog neural branch prediction

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Multiple stream prediction

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems
EXACT: explicit dynamic-branch prediction with active updates

Proceedings of the 7th ACM international conference on Computing frontiers
Modulo path history for the reduction of pipeline overheads in path-based neural branch predictors

International Journal of Parallel Programming
A novel architecture for ahead branch prediction

Frontiers of Computer Science: Selected Publications from Chinese Universities

Quantified Score

Hi-index	0.00

Visualization

Abstract

On a N-way issue superscalar processor, the front end instruction fetch engine must deliver instructions to the execution core at a sustained rate higher than N instructions per cycle. This means that the instruction address generator/predictor (IAG) has to predict the instruction flow at an even higher rate while the prediction accuracy can not be sacrificed.Achieving high accuracy on this prediction becomes more and more critical since the overall pipeline is becoming deeper and deeper with each new generation of processors. Then very complex IAGs featuring different predictors for jumps, returns, conditional and unconditional branches and complex logic are used. Usually, the IAG uses information (branch histories, fetch addresses, . . . ) available at a cycle to predict the next fetch address(es). Unfortunately, a complex IAG cannot deliver a prediction within a short cycle. Therefore, processors rely on a hierarchy of IAGs with increasing accuracies but also increasing latencies: the accurate but slow IAG is used to correct the fast, but less accurate IAG. A significant part of the potential instruction bandwidth is often wasted in pipeline bubbles due to these corrections.As an alternative to the use of a hierarchy of IAGs, it is possible to initiate the instruction address generation several cycles ahead of its use. In this paper, we explore in details such an ahead pipelined IAG. The example illustrated in this paper shows that, even when the instruction address generation is (partially) initiated five cycles ahead of its use, it is possible to reach approximately the same prediction accuracy as the one of a conventional one-block ahead complex IAG. The solution presented in this paper allows to deliver a sustained address generation rate close to one instruction block per cycle with state-of-the art accuracy.