Reducing the frequency of tag compares for low power I-cache design
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Instruction buffering to reduce power in processors for signal processing
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Architectural and compiler techniques for energy reduction in high-performance microprocessors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Using dynamic cache management techniques to reduce energy in general purpose processors
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on system-level interconnect prediction
Power-aware partitioned cache architectures
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Energy and Performance Improvements in Microprocessor Design Using a Loop Cache
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Effective Hardware-Based Two-Way Loop Cache for High Performance Low Power Processors
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
PEAS-III: An ASIP Design Environment
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Power Savings in Embedded Processors through Decode Filer Cache
Proceedings of the conference on Design, automation and test in Europe
Frequent loop detection using efficient non-intrusive on-chip hardware
Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Design and analysis of low-power cache using two-level filter scheme
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Micro-operation cache: a power aware frontend for variable instruction length ISA
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Partial Tag Comparison: A New Technology for Power-Efficient Set-Associative Cache Designs
VLSID '04 Proceedings of the 17th International Conference on VLSI Design
A Content Aware Integer Register File Organization
Proceedings of the 31st annual international symposium on Computer architecture
Decode filter cache for energy efficient instruction cache hierarchy in super scalar architectures
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Instruction buffering exploration for low energy VLIWs with instruction clusters
Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache
Proceedings of the international symposium on Code generation and optimization
A way-halting cache for low-energy high-performance systems
ACM Transactions on Architecture and Code Optimization (TACO)
Lazy BTB: reduce BTB energy consumption using dynamic profiling
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite
WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Register file caching for energy efficiency
Proceedings of the 2006 international symposium on Low power electronics and design
Reducing branch predictor leakage energy by exploiting loops
ACM Transactions on Embedded Computing Systems (TECS) - SPECIAL ISSUE SCOPES 2005
Customization of Register File Banking Architecture for Low Power
VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
BTB Access Filtering: A Low Energy and High Performance Design
ISVLSI '08 Proceedings of the 2008 IEEE Computer Society Annual Symposium on VLSI
Computer
Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers
Microprocessors & Microsystems
ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Lightweight runtime control flow analysis for adaptive loop caching
Proceedings of the 20th symposium on Great lakes symposium on VLSI
Microprocessors & Microsystems
Hi-index | 0.00 |
With the explosive proliferation of embedded systems, especially through countless portable devices and wireless equipment used, embedded systems have become indispensable to the modern society and people's life. Those devices are often battery driven. Therefore, low energy consumption in embedded processors is important and becomes critical in step with the system complexity. The on-chip instruction cache (I-cache) is usually the most energy-consuming component on the processor chip due to its large size and frequent access operations. To reduce such energy consumption, the existing loop cache approaches use a tiny decoded cache to filter the I-cache access and instruction decode activity for repeated loop iterations. However, such designs are effective for small and simple loops, and only suitable for DSP kernel-like applications. They are not effectual for many embedded applications where complex loops are common. In this article, we propose a decoded loop instruction cache (DLIC) that is small, hence energy efficient, yet can capture most loops, including large nested ones with branch executions, so that a significant amount of I-cache accesses and instruction decoding can be eradicated. The experiments on a set of embedded benchmarks show that our proposed DLIC scheme can reduce energy consumption by up to 87% as compared to normal cache-only design. On average, 66% energy can be saved on instruction fetching and decoding, while at a performance overhead of only 1.4%.