A Case for Direct-Mapped Caches
Computer
Improving performance of small on-chip instruction caches
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
An architecture for software-controlled data prefetching
ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Reducing memory latency via non-blocking and prefetching caches
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
Prefetching in supercomputer instruction caches
Proceedings of the 1992 ACM/IEEE conference on Supercomputing
Designing the TFP Microprocessor
IEEE Micro
Cache behavior in the presence of speculative execution: the benefits of misprediction
Cache behavior in the presence of speculative execution: the benefits of misprediction
Instruction cache fetch policies for speculative execution
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Informing memory operations: providing memory performance feedback in modern processors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
ACM Computing Surveys (CSUR)
The Effect of Speculative Execution on Cache Performance
Proceedings of the 8th International Symposium on Parallel Processing
Lockup-free instruction fetch/prefetch cache organization
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Aspects of cache memory and instruction buffer performance
Aspects of cache memory and instruction buffer performance
Improving data cache performance by pre-executing instructions under a cache miss
ICS '97 Proceedings of the 11th international conference on Supercomputing
Multipath execution: opportunities and limits
ICS '98 Proceedings of the 12th international conference on Supercomputing
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Fetch directed instruction prefetching
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
IEEE Transactions on Computers
Procedure placement using temporal-ordering information
ACM Transactions on Programming Languages and Systems (TOPLAS)
Architectural and compiler support for effective instruction prefetching: a cooperative approach
ACM Transactions on Computer Systems (TOCS)
Execution history guided instruction prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
Exploiting the Prefetching Effect Provided by Executing Mispredicted Load Instructions
Euro-Par '02 Proceedings of the 8th International Euro-Par Conference on Parallel Processing
Accurate timing analysis by modeling caches, speculation and their interaction
Proceedings of the 40th annual Design Automation Conference
Call graph prefetching for database applications
ACM Transactions on Computer Systems (TOCS)
Execution History Guided Instruction Prefetching
The Journal of Supercomputing
Modeling control speculation for timing analysis
Real-Time Systems
The Impact of Incorrectly Speculated Memory Operations in a Multithreaded Architecture
IEEE Transactions on Parallel and Distributed Systems
Effective Instruction Prefetching via Fetch Prestaging
IPDPS '05 Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS'05) - Papers - Volume 01
Understanding the effects of wrong-path memory references on processor performance
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Exploring the limits of prefetching
IBM Journal of Research and Development - Electrochemical technology in microelectronics
The instruction register file micro-architecture
Future Generation Computer Systems - Special issue: Parallel computing technologies
IEEE Transactions on Computers
WCET analysis of instruction caches with prefetching
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A low power front-end for embedded processors using a block-aware instruction set
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
The impact of wrong-path memory references in cache-coherent multiprocessor systems
Journal of Parallel and Distributed Computing
Analyzing the worst-case execution time for instruction caches with prefetching
ACM Transactions on Embedded Computing Systems (TECS)
The instruction register file micro-architecture
Future Generation Computer Systems - Special issue: Parallel computing technologies
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Reducing Power and Energy Overhead in Instruction Prefetching for Embedded Processor Systems
International Journal of Handheld Computing Research
Reconciling real-time guarantees and energy efficiency through unlocked-cache prefetching
Proceedings of the 50th Annual Design Automation Conference
RDIP: return-address-stack directed instruction prefetching
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.01 |
Instruction cache misses can severely limit the performance of both superscalar processors and high speed sequential machines. Instruction prefetch algorithms attempt to reduce the performance degradation by bringing lines into the instruction cache before they are needed by the CPU fetch unit. There have been several algorithms proposed to do this, most notably next line prefetching and target prefetching. We propose a new scheme called wrong-path prefetching which combines next-line prefetching with the prefetching of all control instruction targets regardless of the predicted direction of conditional branches. The algorithm substantially reduces the cycles lost to instruction cache misses while somewhat increasing the amount of memory traffic. Wrong-path prefetching performs better than the other prefetch algorithms studied in all of the cache configurations examined while requiring little additional hardware. For example, the best wrong-path prefetch algorithm can result in a speed up of 16% when using an 8K instruction cache. In fact, an 8K wrong-path prefetched instruction cache is shown to achieve the same miss rate as a 32K non-prefetch cache. Finally, it is shown that wrong-path prefetching is applicable to both multi-issue and long L1 miss latency machines.