Instruction cache locking inside a binary rewriter

Authors:
Kapil Anand;Rajeev Barua
Affiliations:
University of Maryland, College Park, College Park, MD, USA;University of Maryland, College Park, College Park, MD, USA
Venue:
CASES '09 Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems
Year:
2009

Citing 16
Cited 3

Application-specific memory management for embedded systems using software-controlled caches

Proceedings of the 37th Annual Design Automation Conference
On-chip vs. off-chip memory: the data partitioning problem in embedded processor-based systems

ACM Transactions on Design Automation of Electronic Systems (TODAES)
An optimal memory allocation scheme for scratch-pad-based embedded systems

ACM Transactions on Embedded Computing Systems (TECS)
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
Data cache locking for higher program predictability

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Low-Complexity Algorithms for Static Cache Locking in Multitasking Hard Real-Time Systems

RTSS '02 Proceedings of the 23rd IEEE Real-Time Systems Symposium
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Cache-Aware Scratchpad Allocation Algorithm

Proceedings of the conference on Design, automation and test in Europe - Volume 2
Dynamic overlay of scratchpad memory for energy minimization

Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
Cooperative Caching with Keep-Me and Evict-Me

INTERACT '05 Proceedings of the 9th Annual Workshop on Interaction between Compilers and Computer Architectures
Generating cache hints for improved program efficiency

Journal of Systems Architecture: the EUROMICRO Journal
An API for Runtime Code Patching

International Journal of High Performance Computing Applications
Exploiting Fixed Programs in Embedded Systems: A Loop Cache Example

IEEE Computer Architecture Letters
Dynamic allocation for scratch-pad memory using compile-time decisions

ACM Transactions on Embedded Computing Systems (TECS)
Improving power efficiency with compiler-assisted cache replacement

Journal of Embedded Computing - Cache exploitation in embedded systems
Compile-time decided instruction cache locking using worst-case execution paths

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis

Instruction cache locking using temporal reuse profile

Proceedings of the 47th Design Automation Conference
Improved procedure placement for set associative caches

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Instruction Cache Locking for Embedded Systems using Probability Profile

Journal of Signal Processing Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cache memories in embedded systems play an important role in reducing the execution time of the applications. Various kinds of extensions have been added to cache hardware to enable software involvement in replacement decisions, thus improving the run-time over a purely hardware-managed cache. Novel embedded systems, like Intel's Xscale and ARM Cortex processors provide the facility of locking one or more lines in cache - this feature is called cache locking. This paper presents the first method in the literature for instruction-cache locking that is able to reduce the average-case run-time of the program. We devise a cost-benefit model to discover the memory addresses which should be locked in the cache. We implement our scheme inside a binary rewriter, thus widening the applicability of our scheme to binaries compiled using any compiler. Results obtained on a suite of MiBench and MediaBench benchmarks show up to 25% improvement in the instruction-cache miss rate on average and up to 13.5% improvement in the execution time on average for applications having instruction accesses as a bottleneck, depending on the cache configuration. The improvement in execution time is as high as 23.5% for some benchmarks.