Bus-invert coding for low-power I/O
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Instruction buffering to reduce power in processors for signal processing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Iterative cache simulation of embedded CPUs with trace stripping
CODES '99 Proceedings of the seventh international workshop on Hardware/software codesign
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
A power reduction technique with object code merging for application specific embedded processors
DATE '00 Proceedings of the conference on Design, automation and test in Europe
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Address bus encoding techniques for system-level power optimization
Proceedings of the conference on Design, automation and test in Europe
Irredundant address bus encoding for low power
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Enhancing loop buffering of media and telecommunications applications using low-overhead predication
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 2001 IEEE/ACM international conference on Computer-aided design
SH3: High Code Density, Low Power
IEEE Micro
Synthesis of customized loop caches for core-based embedded systems
Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design
Energy and Performance Improvements in Microprocessor Design Using a Loop Cache
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Improving Program Efficiency by Packing Instructions into Registers
Proceedings of the 32nd annual international symposium on Computer Architecture
Reducing Instruction Fetch Cost by Packing Instructions into RegisterWindows
Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
Adapting compilation techniques to enhance the packing of instructions into registers
CASES '06 Proceedings of the 2006 international conference on Compilers, architecture and synthesis for embedded systems
A cache design for high performance embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
Addressing instruction fetch bottlenecks by using an instruction register file
Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
A low power front-end for embedded processors using a block-aware instruction set
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Variable-sized object packing and its applications to instruction cache design
Computers and Electrical Engineering
HitME: low power Hit MEmory buffer for embedded systems
Proceedings of the 2009 Asia and South Pacific Design Automation Conference
Compiler Support for Code Size Reduction Using a Queue-Based Processor
Transactions on High-Performance Embedded Architectures and Compilers II
Fine-grain dynamic instruction placement for L0 scratch-pad memory
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Hi-index | 0.00 |
Instruction caches have traditionally been used to improve software performance. Recently, several tiny instruction cache designs, including filter caches and dynamic loop caches, have been proposed to instead reduce software power. We propose several new tiny instruction cache designs, including preloaded loop caches, and one-level and two-level hybrid dynamic/preloaded loop caches. We evaluate the existing and proposed designs on embedded system software benchmarks from both the Powerstone and MediaBench suites, on two different processor architectures, for a variety of different technologies. We show on average that filter caching achieves the best instruction fetch energy reductions of 60--80%, but at the cost of about 20% performance degradation, which could also affect overall energy savings. We show that dynamic loop caching gives good instruction fetch energy savings of about 30%, but that if a designer is able to profile a program, preloaded loop caching can more than double the savings. We describe automated methods for quickly determining the best loop cache configuration, methods useful in a core-based design flow.