Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
A comprehensive instruction fetch mechanism for a processor supporting speculative execution
MICRO 25 Proceedings of the 25th annual international symposium on Microarchitecture
Prefetching in supercomputer instruction caches
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Fast and accurate instruction fetch and branch prediction
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Evaluating stream buffers as a secondary cache replacement
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Tradeoffs in two-level on-chip caching
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Two-level adaptive branch prediction and instruction fetch mechanisms for high performance superscalar processors
Reducing branch costs via branch alignment
ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Optimization of instruction fetch mechanisms for high issue rates
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Instruction fetching: coping with code bloat
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Multiple-block ahead branch predictors
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Value locality and load value prediction
Proceedings of the seventh international conference on Architectural support for programming languages and operating systems
Integrating a misprediction recovery cache (MRC) into a superscalar pipeline
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Increasing the instruction fetch rate via block-structured instruction set architectures
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Exceeding the dataflow limit via value prediction
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Complexity-effective superscalar processors
Proceedings of the 24th annual international symposium on Computer architecture
Target prediction for indirect jumps
Proceedings of the 24th annual international symposium on Computer architecture
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The predictability of data values
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Highly accurate data value prediction using hybrid predictors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
The potential of data value speculation to boost ILP
ICS '98 Proceedings of the 12th international conference on Supercomputing
Predictive techniques for aggressive load speculation
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Improving prediction for procedure returns with return-address-stack repair mechanisms
MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
A scalable front-end architecture for fast instruction delivery
ISCA '99 Proceedings of the 26th annual international symposium on Computer architecture
Fetch directed instruction prefetching
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Clock rate versus IPC: the end of the road for conventional microarchitectures
Proceedings of the 27th annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
Predictor-directed stream buffers
Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Branch Target Buffer Design and Optimization
IEEE Transactions on Computers
Effective Hardware-Based Data Prefetching for High-Performance Processors
IEEE Transactions on Computers
HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Exploring Instruction-Fetch Bandwidth Requirement in Wide-Issue Superscalar Processors
PACT '99 Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques
Instruction prefetching using branch prediction information
ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
The Effects of Mispredicted-Path Execution on Branch Prediction Structures
PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
A Decoupled Predictor-Directed Stream Prefetching Architecture
IEEE Transactions on Computers
High Performance and Energy Efficient Serial Prefetch Architecture
ISHPC '02 Proceedings of the 4th International Symposium on High Performance Computing
Using a serial cache for energy efficient instruction fetching
Journal of Systems Architecture: the EUROMICRO Journal
A Hybrid Analysis of an Optimization Approach for Cluster Applications
The Journal of Supercomputing
Energy-efficient and high-performance instruction fetch using a block-aware ISA
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
An Efficient Way of Passing of Data in a Multithreaded Scheduled Dataflow Architecture
HPCASIA '05 Proceedings of the Eighth International Conference on High-Performance Computing in Asia-Pacific Region
Block-aware instruction set architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Wide and efficient trace prediction using the local trace predictor
Proceedings of the 20th annual international conference on Supercomputing
A latency-conscious SMT branch prediction architecture
International Journal of High Performance Computing and Networking
Temporal instruction fetch streaming
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Improving instruction delivery with a block-aware ISA
Euro-Par'05 Proceedings of the 11th international Euro-Par conference on Parallel Processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
RDIP: return-address-stack directed instruction prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 14.98 |
In the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instruction delivery mechanism scale with the execution core. Attaining these targets is a challenging task due to I-cache misses, branch mispredictions, and taken branches in the instruction stream. To counter these challenges, we present a fetch architecture that decouples the branch predictor from the instruction fetch unit. A Fetch Target Queue (FTQ) is inserted between the branch predictor and instruction cache. This allows the branch predictor to run far in advance of the address currently being fetched by the cache. The decoupling enables a number of architecture optimizations, including multilevel branch predictor design, fetch-directed instruction prefetching, and easier pipelining of the instruction cache. For the multilevel predictor, we show that it performs better than a single-level predictor, even when ignoring the effects of cycle-timing issues. We also examine the performance of fetch-directed instruction prefetching using a multilevel branch predictor and show that an average 19 percent speedup is achieved. In addition, we examine pipelining the instruction cache to achieve a faster cycle time for the processor pipeline and show that pipelining provides an average 27 percent speedup over not pipelining the instruction cache for the programs examined.