An investigation of the performance of various dynamic scheduling techniques
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache
ICS '93 Proceedings of the 7th international conference on Supercomputing
IBM Power and PowerPC
Two-level adaptive branch prediction and instruction fetch mechanisms for high performance superscalar processors
Next cache line and set prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Control flow prediction with tree-like subgraphs for superscalar processors
Proceedings of the 28th annual international symposium on Microarchitecture
An investigation of the performance of various instruction-issue buffer topologies
Proceedings of the 28th annual international symposium on Microarchitecture
Don't use the page number, but a pointer to it
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
Control flow prediction for dynamic ILP processors
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Increasing the instruction fetch rate via block-structured instruction set architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Selective eager execution on the PolyPath architecture
Proceedings of the 25th annual international symposium on Computer architecture
A Trace Cache Microarchitecture and Evaluation
IEEE Transactions on Computers - Special issue on cache memory and related problems
Evaluation of Design Options for the Trace Cache Fetch Mechanism
IEEE Transactions on Computers - Special issue on cache memory and related problems
Correlated load-address predictors
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A scalable front-end architecture for fast instruction delivery
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Control Flow Prediction Schemes for Wide-Issue Superscalar Processors
IEEE Transactions on Parallel and Distributed Systems
Fetch directed instruction prefetching
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Completion time multiple branch prediction for enhancing trace cache performance
Proceedings of the 27th annual international symposium on Computer architecture
The impact of delay on the design of branch predictors
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Optimizations Enabled by a Decoupled Front-End Architecture
IEEE Transactions on Computers
Increasing processor performance by implementing deeper pipelines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design tradeoffs for the Alpha EV8 conditional branch predictor
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Two cache lines prediction for a wide-issue micro-architecture
ACSAC '01 Proceedings of the 6th Australasian conference on Computer systems architecture
Increasing the Instruction Fetch Rate via Block-Structured Instruction Set Architectures
International Journal of Parallel Programming
An Exploration of Instruction Fetch Requirement in Out-of-Order Superscalar Processors
International Journal of Parallel Programming
Software Trace Cache for Commercial Applications
International Journal of Parallel Programming
Putting Data Value Predictors to Work in Fine-Grain Parallel Processors
HiPC '01 Proceedings of the 8th International Conference on High Performance Computing
The Case for Speculative Multithreading on SMT Processors
ISHPC '00 Proceedings of the Third International Symposium on High Performance Computing
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Reconsidering Complex Branch Predictors
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Effective ahead pipelining of instruction block address generation
Proceedings of the 30th annual international symposium on Computer architecture
Merging path and gshare indexing in perceptron branch prediction
ACM Transactions on Architecture and Code Optimization (TACO)
Block-aware instruction set architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Wide and efficient trace prediction using the local trace predictor
Proceedings of the 20th annual international conference on Supercomputing
Evaluating trace cache energy efficiency
ACM Transactions on Architecture and Code Optimization (TACO)
International Journal of Modelling and Simulation
Proceedings of the 38th annual international symposium on Computer architecture
Hi-index | 0.00 |
A basic rule in computer architecture is that a processor cannot execute an application faster than it fetches its instructions. This paper presents a novel cost-effective mechanism called the two-block ahead branch predictor. Information from the current instruction block is not used for predicting the address of the next instruction block, but rather for predicting the block following the next instruction block.This approach overcomes the instruction fetch bottle-neck exhibited by wide-dispatch "brainiac" processors by enabling them to efficiently predict addresses of two instruction blocks in a single cycle. Furthermore, pipelining the branch prediction process can also be done by means of our predictor for "speed demon" processors to achieve higher clock rate or to improve the prediction accuracy by means of bigger prediction structures.Moreover, and unlike the previously-proposed multiple predictor schemes, multiple-block ahead branch predictors can use any of the branch prediction schemes to perform the very accurate predictions required to achieve high-performance on superscalar processors.