Program optimization for instruction caches
ASPLOS III Proceedings of the third international conference on Architectural support for programming languages and operating systems
Design and evaluation of a compiler algorithm for prefetching
ASPLOS V Proceedings of the fifth international conference on Architectural support for programming languages and operating systems
ACM Computing Surveys (CSUR)
The Effect of Code Expanding Optimizations on Instruction Cache Design
IEEE Transactions on Computers
A study of branch prediction strategies
ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
Experimental evaluation of on-chip microprocessor cache memories
ISCA '84 Proceedings of the 11th annual international symposium on Computer architecture
Execution history guided instruction prefetching
ICS '02 Proceedings of the 16th international conference on Supercomputing
A Model of a Microprocessor with a Wide Command Word
Cybernetics and Systems Analysis
Content-Based Prefetching: Initial Results
IMS '00 Revised Papers from the Second International Workshop on Intelligent Memory Systems
Call graph prefetching for database applications
ACM Transactions on Computer Systems (TOCS)
Execution History Guided Instruction Prefetching
The Journal of Supercomputing
Cluster miss prediction for instruction caches in embedded networking applications
Proceedings of the 14th ACM Great Lakes symposium on VLSI
Cluster miss prediction with prefetch on miss for embedded CPU instruction caches
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Exploring the limits of prefetching
IBM Journal of Research and Development - Electrochemical technology in microelectronics
An effective instruction cache prefetch policy by exploiting cache history information
EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing
Cost Minimization with HPDFG and Data Mining for Heterogeneous DSP
Journal of Signal Processing Systems
Hi-index | 14.98 |
Prefetching methods for instruction caches are studied via trace-driven simulation. The two primary methods are "fall-through" prefetch (sometimes referred to as "one block lookahead") and "target" prefetch. Fall-through prefetches are for sequential line accesses, and a key parameter is the distance from the end of the current line where the prefetch for the next line is initiated. Target prefetches work also for nonsequential line accesses. A prediction table is used and a key aspect is the prediction algorithm implemented by the table. Fall-through prefetch and target prefetch each improve performance significantly. When combined in a hybrid algorithm, their performance improvement is nearly additive. An instruction cache using a combined target and fall-through method can provide the same performance as a two to four times larger cache that does not prefetch. A good prediction method must not only be accurate, but prefetches must be initiated early enough to allow time for the instructions to return from main memory. To quantify this, we define a "prefetch efficiency" measure that reflects the amount of memory fetch delay that may be successfully hidden by prefetching. The better prefetch methods (in terms of miss rate) also have very high efficiencies, hiding approximately 90 percent of the miss delay for prefetched lines. Another performance measure of interest is memory traffic. Without prefetching, large line sizes give better hit rates; with prefetching, small line sizes tend to give better overall hit rates. Because smaller line sizes tend to reduce memory traffic, the top-performing prefetch caches produce less memory traffic than the top-performing nonprefetch caches of the same size.