A design framework to efficiently explore energy-delay tradeoffs
Proceedings of the ninth international symposium on Hardware/software codesign
Data and memory optimization techniques for embedded systems
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Sentry tag: an efficient filter scheme for low power cache
CRPIT '02 Proceedings of the seventh Asia-Pacific conference on Computer systems architecture
Memory Architectures for Embedded Systems-On-Chip
HiPC '02 Proceedings of the 9th International Conference on High Performance Computing
Compiler optimization on VLIW instruction scheduling for low power
ACM Transactions on Design Automation of Electronic Systems (TODAES)
A system-level methodology for fast multi-objective design space exploration
Proceedings of the 13th ACM Great Lakes symposium on VLSI
Adaptive mode control: A static-power-efficient cache design
ACM Transactions on Embedded Computing Systems (TECS)
Proceedings of the 2003 ACM symposium on Applied computing
Design and analysis of low-power cache using two-level filter scheme
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Micro-operation cache: a power aware frontend for variable instruction length ISA
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Proceedings of the 2004 ACM symposium on Applied computing
Code Placement with Selective Cache Activity Minimization for Embedded Real-time Software Design
Proceedings of the 2003 IEEE/ACM international conference on Computer-aided design
HotSpot cache: joint temporal and spatial locality exploitation for i-cache energy reduction
Proceedings of the 2004 international symposium on Low power electronics and design
Power-Performance System-Level Exploration of a MicroSPARC2-Based Embedded Architecture
DATE '03 Proceedings of the conference on Design, Automation and Test in Europe: Designers' Forum - Volume 2
A sink-n-hoist framework for leakage power reduction
Proceedings of the 5th ACM international conference on Embedded software
Compilers for leakage power reduction
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Multi-objective design space exploration of embedded systems
Journal of Embedded Computing - Low-power Embedded Systems
Compilation for compact power-gating controls
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Instruction cache energy saving through compiler way-placement
Proceedings of the conference on Design, automation and test in Europe
Branch target buffer design for embedded processors
Microprocessors & Microsystems
Enabling large decoded instruction loop caching for energy-aware embedded processors
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Microprocessors & Microsystems
Compiler analysis and supports for leakage power reduction on microprocessors
LCPC'02 Proceedings of the 15th international conference on Languages and Compilers for Parallel Computing
Link-time optimization for power efficiency in a tagless instruction cache
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Towards a performance- and energy-efficient data filter cache
Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
Power devil: tool for power gating strategy selection
Proceedings of the 10th Workshop on Optimizations for DSP and Embedded Systems
Shrinking l1 instruction caches to improve energy: delay in SMT embedded processors
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
DLIC: Decoded loop instructions caching for energy-aware embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Hi-index | 0.00 |
In this paper, we focus on low-power design techniques for high-performance processors at the architectural and compiler levels. We focus mainly on developing methods for reducing the energy dissipated in the on-chip caches. Energy dissipated in caches represents a substantial portion in the energy budget of today's processors. Extrapolating current trends, this portion is likely to increase in the near future, since the devices devoted to the caches occupy an increasingly larger percentage of the total area of the chip. We propose a method that uses an additional minicache located between the I-Cache and the central processing unit (CPU) core and buffers instructions that are nested within loops and are continuously otherwise fetched from the I-Cache. This mechanism is combined with code modifications, through the compiler, that greatly simplify the required hardware, eliminate unnecessary instruction fetching, and consequently reduce signal switching activity and the dissipated energy. We show that the additional cache, dubbed L-Cache, is much smaller and simpler than the I-Cache when the compiler assumes the role of allocating instructions to it. Through simulation, we show that for the SPECfp95 benchmarks, the I-Cache remains disabled most of the time, and the "cheaper" extra cache is used instead. We also propose different techniques that are better adapted to nonnumeric nonloop-intensive code.