Increasing the instruction fetch rate via multiple branch prediction and a branch address cache
ICS '93 Proceedings of the 7th international conference on Supercomputing
Characterizing the impact of predicated execution on branch prediction
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction fetching: coping with code bloat
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiple-block ahead branch predictors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The performance potential of data dependence speculation & collapsing
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Branch history table indexing to prevent pipeline bubbles in wide-issue superscalar processors
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
Proceedings of the 24th annual international symposium on Computer architecture
Target prediction for indirect jumps
Proceedings of the 24th annual international symposium on Computer architecture
Trading conflict and capacity aliasing in conditional branch predictors
Proceedings of the 24th annual international symposium on Computer architecture
Selective eager execution on the PolyPath architecture
Proceedings of the 25th annual international symposium on Computer architecture
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
The cascaded predictor: economical and adaptive branch target prediction
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Dynamic Hammock Predication for Non-Predicated Instruction Set Architectures
PACT '98 Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques
Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Effective ahead pipelining of instruction block address generation
Proceedings of the 30th annual international symposium on Computer architecture
Hi-index | 0.01 |
The performance of superscalar processors depends on many parameters with correlated effects. This paper explores the relations between some of these parameters, and more particularly, the requirement in instruction fetch bandwidth. We introduce new enhancements to increase the bandwidth of conventional instruction fetch engines. However, experiments show that the performance does not increase proportionally to the fetch. Once the measured IPC is half the instruction fetch bandwidth, increasing the fetch bandwidth brings very little improvement. In order to better understand this behavior, we develop a model from the empirical observation that the available instruction parallelism grows as the square root of the instruction window size. From the model, we derive that the fetch bandwidth requirement grows as the square root of the distance between mispredicted branches. We also verify experimentally that, to double the IPC, one should both double the fetch bandwidth and decrease the number of mispredicted branches fourfold.