Software-controlled caches in the VMP multiprocessor
ISCA '86 Proceedings of the 13th annual international symposium on Computer architecture
Mimic: a fast system/370 simulator
SIGPLAN '87 Papers of the Symposium on Interpreters and interpretive techniques
The superblock: an effective technique for VLIW and superscalar compilation
The Journal of Supercomputing - Special issue on instruction-level parallelism
Shade: a fast instruction-set simulator for execution profiling
SIGMETRICS '94 Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems
VCODE: a retargetable, extensible, very fast dynamic code generation system
PLDI '96 Proceedings of the ACM SIGPLAN 1996 conference on Programming language design and implementation
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
DAISY: dynamic compilation for 100% architectural compatibility
Proceedings of the 24th annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
A fully associative software-managed cache design
Proceedings of the 27th annual international symposium on Computer architecture
Dynamo: a transparent dynamic optimization system
PLDI '00 Proceedings of the ACM SIGPLAN 2000 conference on Programming language design and implementation
ACM Computing Surveys (CSUR)
A proposal to establish a pseudo virtual memory via writable overlays
Communications of the ACM
Operating Systems: Program overlay techniques
Communications of the ACM
Communications of the ACM
Dynamic Binary Translation and Optimization
IEEE Transactions on Computers
Secure Execution via Program Shepherding
Proceedings of the 11th USENIX Security Symposium
Scratchpad memory: design alternative for cache on-chip memory in embedded systems
Proceedings of the tenth international symposium on Hardware/software codesign
DELI: a new run-time control point
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Software-Managed Address Translation
HPCA '97 Proceedings of the 3rd IEEE Symposium on High-Performance Computer Architecture
Assigning Program and Data Objects to Scratchpad for Energy Reduction
Proceedings of the conference on Design, automation and test in Europe
Using Complete Machine Simulation for Software Power Estimation: The SoftWatt Approach
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Novel Caches for Predictable Computing
Novel Caches for Predictable Computing
Shade: A Fast Instruction Set Simulator for Execution Profiling
Shade: A Fast Instruction Set Simulator for Execution Profiling
Exploring Code Cache Eviction Granularities in Dynamic Optimization Systems
Proceedings of the international symposium on Code generation and optimization: feedback-directed and runtime optimization
Evaluation of the Raw Microprocessor: An Exposed-Wire-Delay Architecture for ILP and Streams
Proceedings of the 31st annual international symposium on Computer architecture
A post-compiler approach to scratchpad mapping of code
Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache
Proceedings of the international symposium on Code generation and optimization
Dynamic Overlay of Scratchpad Memory for Energy Minimization
CODES+ISSS '04 Proceedings of the international conference on Hardware/Software Codesign and System Synthesis: 2004
Heap data allocation to scratch-pad memory in embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
Fragment cache management for dynamic binary translators in embedded systems with scratchpad
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Reducing pressure in bounded DBT code caches
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Predictable programming on a precision timed architecture
CASES '08 Proceedings of the 2008 international conference on Compilers, architectures and synthesis for embedded systems
Hybrid access-specific software cache techniques for the cell BE architecture
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
COMIC: a coherent shared memory interface for cell be
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Dynamic code footprint optimization for the IBM Cell Broadband Engine
IWMSE '09 Proceedings of the 2009 ICSE Workshop on Multicore Software Engineering
Dynamically utilizing computation accelerators for extensible processors in a software approach
CODES+ISSS '09 Proceedings of the 7th IEEE/ACM international conference on Hardware/software codesign and system synthesis
Heterogeneous code cache: using scratchpad and main memory in dynamic binary translators
Proceedings of the 46th Annual Design Automation Conference
PDCN '08 Proceedings of the IASTED International Conference on Parallel and Distributed Computing and Networks
ACM Transactions on Architecture and Code Optimization (TACO)
Customized placement for high performance embedded processor caches
ARCS'07 Proceedings of the 20th international conference on Architecture of computing systems
Cost-effectively offering private buffers in SoCs and CMPs
Proceedings of the international conference on Supercomputing
Buffer-integrated-Cache: a cost-effective SRAM architecture for handheld and embedded platforms
Proceedings of the 48th Design Automation Conference
An exploration of mechanisms for dynamic cryptographic instruction set extension
CHES'11 Proceedings of the 13th international conference on Cryptographic hardware and embedded systems
Memory optimization of dynamic binary translators for embedded systems
ACM Transactions on Architecture and Code Optimization (TACO)
Enabling dynamic binary translation in embedded systems with scratchpad memory
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
While hardware instruction caches are present in virtually all general-purpose and high-performance microprocessors today, many embedded processors use SRAM or scratchpad memories instead. These are simple array memory structures that are directly addressed and explicitly managed by software. Compared to hardware caches of the same data capacity, they are smaller, have shorter access times and consume less energy per access. Access times are also easier to predict with simple memories since there is no possibility of a "miss." On the other hand, they are more difficult for the programmer to use since they are not automatically managed.In this paper, we present a software system that allows all or part of an SRAM or scratchpad memory to be automatically managed as a cache. This system provides the programming convenience of a cache for processors that lack dedicated caching hardware. It has been implemented for an actual processor and runs on real hardware. Our results show that a software-based instruction cache can be built that provides performance within 10% of a traditional hardware cache on many benchmarks while using a cheaper, simpler, SRAM memory. On these same benchmarks, energy consumption is up to 3% lower than it would be using a hardware cache.