Software-based instruction caching for embedded processors

Authors:
Jason E. Miller;Anant Agarwal
Affiliations:
Massachusetts Institute of Technology;Massachusetts Institute of Technology
Venue:
Proceedings of the 12th international conference on Architectural support for programming languages and operating systems
Year:
2006

Citing 35
Cited 16

Software-controlled caches in the VMP multiprocessor

ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Mimic: a fast system/370 simulator

SIGPLAN '87 Papers of the Symposium on Interpreters and interpretive techniques
The superblock: an effective technique for VLIW and superscalar compilation

The Journal of Supercomputing - Special issue on instruction-level parallelism
Shade: a fast instruction-set simulator for execution profiling

SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
VCODE: a retargetable, extensible, very fast dynamic code generation system

PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Embra: fast and flexible machine simulation

Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
DAISY: dynamic compilation for 100% architectural compatibility

Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system

PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
Virtual Memory

ACM Computing Surveys (CSUR)
A proposal to establish a pseudo virtual memory via writable overlays

Communications of the ACM
Operating Systems: Program overlay techniques

Communications of the ACM
The performance of a system for automatic segmentation of programs within an ALGOL compiler (GIER ALGOL)

Communications of the ACM
Dynamic Binary Translation and Optimization

IEEE Transactions on Computers
The Raw Microprocessor: A Computational Fabric for Software Circuits and General-Purpose Programs

IEEE Micro
Secure Execution via Program Shepherding

Proceedings of the 11th USENIX Security Symposium
Scratchpad memory: design alternative for cache on-chip memory in embedded systems

Proceedings of the tenth international symposium on Hardware/software codesign
DELI: a new run-time control point

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Software-Managed Address Translation

HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Assigning Program and Data Objects to Scratchpad for Energy Reduction

Proceedings of the conference on Design, automation and test in Europe
Using Complete Machine Simulation for Software Power Estimation: The SoftWatt Approach

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Novel Caches for Predictable Computing

Novel Caches for Predictable Computing
Shade: A Fast Instruction Set Simulator for Execution Profiling

Shade: A Fast Instruction Set Simulator for Execution Profiling
Exploring Code Cache Eviction Granularities in Dynamic Optimization Systems

Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams

Proceedings of the 31st annual international symposium on Computer architecture
A post-compiler approach to scratchpad mapping of code

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache

Proceedings of the international symposium on Code generation and optimization
Dynamic Overlay of Scratchpad Memory for Energy Minimization

CODES+ISSS '04 Proceedings of the international conference on Hardware/Software Codesign and System Synthesis: 2004
Synergistic Processing in Cell's Multicore Architecture

IEEE Micro
Using advanced compiler technology to exploit the performance of the Cell Broadband EngineTM architecture

IBM Systems Journal
Heap data allocation to scratch-pad memory in embedded systems

Journal of Embedded Computing - Cache exploitation in embedded systems

Fragment cache management for dynamic binary translators in embedded systems with scratchpad

CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Reducing pressure in bounded DBT code caches

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Predictable programming on a precision timed architecture

CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Hybrid access-specific software cache techniques for the cell BE architecture

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
COMIC: a coherent shared memory interface for cell be

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Dynamic code footprint optimization for the IBM Cell Broadband Engine

IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Dynamically utilizing computation accelerators for extensible processors in a software approach

CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Heterogeneous code cache: using scratchpad and main memory in dynamic binary translators

Proceedings of the 46th Annual Design Automation Conference
Program overlays revisited

PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

ACM Transactions on Architecture and Code Optimization (TACO)
Customized placement for high performance embedded processor caches

ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Cost-effectively offering private buffers in SoCs and CMPs

Proceedings of the international conference on Supercomputing
Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms

Proceedings of the 48th Design Automation Conference
An exploration of mechanisms for dynamic cryptographic instruction set extension

CHES'11 Proceedings of the 13th international conference on Cryptographic hardware and embedded systems
Memory optimization of dynamic binary translators for embedded systems

ACM Transactions on Architecture and Code Optimization (TACO)
Enabling dynamic binary translation in embedded systems with scratchpad memory

ACM Transactions on Embedded Computing Systems (TECS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

While hardware instruction caches are present in virtually all general-purpose and high-performance microprocessors today, many embedded processors use SRAM or scratchpad memories instead. These are simple array memory structures that are directly addressed and explicitly managed by software. Compared to hardware caches of the same data capacity, they are smaller, have shorter access times and consume less energy per access. Access times are also easier to predict with simple memories since there is no possibility of a "miss." On the other hand, they are more difficult for the programmer to use since they are not automatically managed.In this paper, we present a software system that allows all or part of an SRAM or scratchpad memory to be automatically managed as a cache. This system provides the programming convenience of a cache for processors that lack dedicated caching hardware. It has been implemented for an actual processor and runs on real hardware. Our results show that a software-based instruction cache can be built that provides performance within 10% of a traditional hardware cache on many benchmarks while using a cheaper, simpler, SRAM memory. On these same benchmarks, energy consumption is up to 3% lower than it would be using a hardware cache.