Highly concurrent scalar processing
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Branch folding in the CRISP microprocessor: reducing branch delay to zero
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
An evaluation of branch architectures
ISCA '87 Proceedings of the 14th annual international symposium on Computer architecture
Limits on multiple instruction issue
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Journal of Parallel and Distributed Computing
MICRO 15 Proceedings of the 15th annual workshop on Microprogramming
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Branch strategies: modeling and optimization
Branch strategies: modeling and optimization
Branch with masked squashing in superpipelined processors
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Minimizing branch misprediction penalties for superpipelined processors
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
A Practical Methodology for the Formal Verification of RISC Processors
Formal Methods in System Design
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
Instruction Window Size Trade-Offs and Characterization of Program Parallelism
IEEE Transactions on Computers
Hi-index | 14.99 |
The authors provide a common platform for modeling different schemes for reducing the branch-delay penalty in pipelined processors as well as evaluating the associated increased instruction bandwidth. Their objective is twofold: to develop a model for different approaches to the branch problem and to help select an optimal strategy after taking into account additional i-traffic generation by branch strategies. The model presented provides a flexible tool for comparing different branch strategies in terms of the reduction it offers in average branch delay and also in terms of the associated cost of wasted instruction fetches. This additional criterion turns out to be a valuable consideration in choosing between two strategies that perform almost equally. More importantly, it provides a better insight into the expected overall system performance. Simple compiler-support-based low-implementation-cost strategies can be very effective under certain conditions. An active branch prediction scheme based on loop buffers can be as competitive as a branch-target-buffer based strategy.