A low-complexity fetch architecture for high-performance superscalar processors

Authors:
Oliverio J. Santana;Alex Ramirez;Josep L. Larriba-Pey;Mateo Valero
Affiliations:
Universitat Politécnica de Catalunya, Barcelona, Spain;Universitat Politécnica de Catalunya, Barcelona, Spain;Universitat Politécnica de Catalunya, Barcelona, Spain;Universitat Politécnica de Catalunya, Barcelona, Spain
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2004

Citing 25
Cited 4

Branch history table prediction of moving target branches due to subroutine returns

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
A comprehensive instruction fetch mechanism for a processor supporting speculative execution

MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Path-based next trace prediction

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Alternative fetch and issue policies for the trace cache fetch mechanism

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The cascaded predictor: economical and adaptive branch target prediction

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A Trace Cache Microarchitecture and Evaluation

IEEE Transactions on Computers - Special issue on cache memory and related problems
The block-based trace cache

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
A scalable front-end architecture for fast instruction delivery

ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Software trace cache

ICS '99 Proceedings of the 13th international conference on Supercomputing
Clock rate versus IPC: the end of the road for conventional microarchitectures

Proceedings of the 27th annual international symposium on Computer architecture
The impact of delay on the design of branch predictors

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Increasing the size of atomic instruction blocks using control flow assertions

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Code layout optimizations for transaction processing workloads

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design tradeoffs for the Alpha EV8 conditional branch predictor

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Filtering Techniques to Improve Trace-Cache Efficiency

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
A Comprehensive Analysis of Indirect Branch Prediction

ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Fetching instruction streams

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamic Branch Prediction with Perceptrons

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Delay-sensitive branch predictors for future technologies

Delay-sensitive branch predictors for future technologies

Branch predictor guided instruction decoding

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Block-aware instruction set architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Enlarging Instruction Streams

IEEE Transactions on Computers
Multiple stream prediction

ISHPC'05/ALPS'06 Proceedings of the 6th international symposium on high-performance computing and 1st international conference on Advanced low power systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Fetch engine performance is a key topic in superscalar processors, since it limits the instruction-level parallelism that can be exploited by the execution core. In the search of high performance, the fetch engine has evolved toward more efficient designs, but its complexity has also increased.In this paper, we present the stream fetch engine, a novel architecture based on the execution of long streams of sequential instructions, taking maximum advantage of code layout optimizations. We describe our design in detail, showing that it achieves high fetch performance, while requiring less complexity than other state-of-the-art fetch architectures.