Modeled and Measured Instruction Fetching Performance for Superscalar Microprocessors

Authors:
Steven Wallace;Nader Bagherzadeh
Affiliations:
Univ. of California at Irvine, Irvine;Univ. of California at Irvine, Irvine
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
1998

Citing 9
Cited 5

Limits on multiple instruction issue

ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Machine organization of the IBM RISC System/6000 processor

IBM Journal of Research and Development
Increasing the instruction fetch rate via multiple branch prediction and a branch address cache

ICS '93 Proceedings of the 7th international conference on Supercomputing
Shade: a fast instruction-set simulator for execution profiling

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Fast and accurate instruction fetch and branch prediction

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Reducing branch costs via branch alignment

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimization of instruction fetch mechanisms for high issue rates

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
The Metaflow Architecture

IEEE Micro
Hardware and software mechanisms for instruction fetch prediction

Hardware and software mechanisms for instruction fetch prediction

Optimizing Overall Loop Schedules Using Prefetching and Partitioning

IEEE Transactions on Parallel and Distributed Systems
Minimizing Average Schedule Length under Memory Constraints by Optimal Partitioning and Prefetching

Journal of VLSI Signal Processing Systems
Design of Instruction Stream Buffer with Trace Support for X86 Processors

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Iterational retiming with partitioning: Loop scheduling with complete memory latency hiding

ACM Transactions on Embedded Computing Systems (TECS)
Trace Cache Miss Rate

International Journal of Modelling and Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Instruction fetching is critical to the performance of a superscalar microprocessor. We develop a mathematical model for three different cache techniques and evaluate its performance both in theory and in simulation using the SPEC95 suite of benchmarks. In all the techniques, the fetching performance is dramatically lower than ideal expectations. To help remedy the situation, we also evaluate its performance using prefetching. Nevertheless, fetching performance is fundamentally limited by control transfers. To solve this problem, we introduce a new fetching mechanism called a dual branch target buffer. The dual branch target buffer enables fetching performance to leap beyond the limitation imposed by conventional methods and achieve a high instruction fetching rate.