Cluster miss prediction with prefetch on miss for embedded CPU instruction caches

Authors:
Ken Batcher;Robert Walker
Affiliations:
Kent State University, Cisco Systems, Richfield, OH;Kent State University, Kent, OH
Venue:
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2004

Citing 20
Cited 1

Improving instruction cache behavior by reducing cache pollution

Proceedings of the 1990 ACM/IEEE conference on Supercomputing
Effectiveness of trace sampling for performance debugging tools

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
Threaded prefetching: an adaptive instruction prefetch mechanism

Microprocessing and Microprogramming
Evaluating stream buffers as a secondary cache replacement

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
Architectural exploration and optimization of local memory in embedded systems

ISSS '97 Proceedings of the 10th international symposium on System synthesis
Prefetching using Markov predictors

Proceedings of the 24th annual international symposium on Computer architecture
A Performance Study of Instruction Cache Prefetching Methods

IEEE Transactions on Computers
Improving direct-mapped cache performance by the addition of a small fully-associative cache prefetch buffers

25 years of the international symposia on Computer architecture (selected papers)
Cooperative prefetching: compiler and hardware support for effective instruction prefetching in modern processors

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Application-specific memory management for embedded systems using software-controlled caches

Proceedings of the 37th Annual Design Automation Conference
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
ARM System-on-Chip Architecture

ARM System-on-Chip Architecture
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Software-assisted cache replacement mechanisms for embedded systems

Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Instruction prefetching using branch prediction information

ICCD '97 Proceedings of the 1997 International Conference on Computer Design (ICCD '97)
Compiler-decided dynamic memory allocation for scratch-pad based embedded systems

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Cluster miss prediction for instruction caches in embedded networking applications

Proceedings of the 14th ACM Great Lakes symposium on VLSI
Support for software performance tuning on network processors

IEEE Network: The Magazine of Global Internetworking

An effective instruction cache prefetch policy by exploiting cache history information

EUC'05 Proceedings of the 2005 international conference on Embedded and Ubiquitous Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Soft CPU cores are often used in embedded systems, yet they limit opportunities to improve cache performance to hardware assistance outside the CPU core. Instruction prefetching is commonly used, but the popular Prefetch On Miss (POM) technique is less helpful when the instruction flow does not follow a sequential execution order, which is often the case in real-time networking applications. Cluster Miss Prediction (CMP) can help in those worst case situations when cache misses do not follow a sequential order, and can be combined with POM to provide an effective technique for real-time networking applications on embedded systems. The benefits of the CMP+POM technique are illustrated in the context of an industrial embedded networking application using different cache configurations.