DLIC: Decoded loop instructions caching for energy-aware embedded processors

Authors:
Ji Gu;Hui Guo;Tohru Ishihara
Affiliations:
Kyoto University, Kyoto, Japan;The University of New South Wales, Sydney, NSW, Australia;Kyoto University, Kyoto, Japan
Venue:
ACM Transactions on Embedded Computing Systems (TECS)
Year:
2013

Citing 34
Cited 0

Reducing the frequency of tag compares for low power I-cache design

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Cache design trade-offs for power and performance optimization: a case study

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Instruction buffering to reduce power in processors for signal processing

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on low power electronics and design
Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Architectural and compiler techniques for energy reduction in high-performance microprocessors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
A low power unified cache architecture providing power and performance flexibility (poster session)

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Using dynamic cache management techniques to reduce energy in general purpose processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special issue on system-level interconnect prediction
Power-aware partitioned cache architectures

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Energy and Performance Improvements in Microprocessor Design Using a Loop Cache

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Effective Hardware-Based Two-Way Loop Cache for High Performance Low Power Processors

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
PEAS-III: An ASIP Design Environment

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Power Savings in Embedded Processors through Decode Filer Cache

Proceedings of the conference on Design, automation and test in Europe
Frequent loop detection using efficient non-intrusive on-chip hardware

Proceedings of the 2003 international conference on Compilers, architecture and synthesis for embedded systems
Design and analysis of low-power cache using two-level filter scheme

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Micro-operation cache: a power aware frontend for variable instruction length ISA

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low power
Partial Tag Comparison: A New Technology for Power-Efficient Set-Associative Cache Designs

VLSID '04 Proceedings of the 17th International Conference on VLSI Design
A Content Aware Integer Register File Organization

Proceedings of the 31st annual international symposium on Computer architecture
Decode filter cache for energy efficient instruction cache hierarchy in super scalar architectures

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Instruction buffering exploration for low energy VLIWs with instruction clusters

Proceedings of the 2004 Asia and South Pacific Design Automation Conference
Compiler Managed Dynamic Instruction Placement in a Low-Power Code Cache

Proceedings of the international symposium on Code generation and optimization
A way-halting cache for low-energy high-performance systems

ACM Transactions on Architecture and Code Optimization (TACO)
Lazy BTB: reduce BTB energy consumption using dynamic profiling

ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Register file caching for energy efficiency

Proceedings of the 2006 international symposium on Low power electronics and design
Reducing branch predictor leakage energy by exploiting loops

ACM Transactions on Embedded Computing Systems (TECS) - SPECIAL ISSUE SCOPES 2005
Customization of Register File Banking Architecture for Low Power

VLSID '07 Proceedings of the 20th International Conference on VLSI Design held jointly with 6th International Conference: Embedded Systems
BTB Access Filtering: A Low Energy and High Performance Design

ISVLSI '08 Proceedings of the 2008 IEEE Computer Society Annual Symposium on VLSI
Efficient Embedded Computing

Computer
Thrifty BTB: A comprehensive solution for dynamic power reduction in branch target buffers

Microprocessors & Microsystems
Reducing power consumption of embedded processors through register file partitioning and compiler support

ASAP '08 Proceedings of the 2008 International Conference on Application-Specific Systems, Architectures and Processors
Lightweight runtime control flow analysis for adaptive loop caching

Proceedings of the 20th symposium on Great lakes symposium on VLSI
Full Length Article: An on-chip instruction cache design with one-bit tag for low-power embedded systems

Microprocessors & Microsystems

Quantified Score

Hi-index	0.00

Visualization

Abstract

With the explosive proliferation of embedded systems, especially through countless portable devices and wireless equipment used, embedded systems have become indispensable to the modern society and people's life. Those devices are often battery driven. Therefore, low energy consumption in embedded processors is important and becomes critical in step with the system complexity. The on-chip instruction cache (I-cache) is usually the most energy-consuming component on the processor chip due to its large size and frequent access operations. To reduce such energy consumption, the existing loop cache approaches use a tiny decoded cache to filter the I-cache access and instruction decode activity for repeated loop iterations. However, such designs are effective for small and simple loops, and only suitable for DSP kernel-like applications. They are not effectual for many embedded applications where complex loops are common. In this article, we propose a decoded loop instruction cache (DLIC) that is small, hence energy efficient, yet can capture most loops, including large nested ones with branch executions, so that a significant amount of I-cache accesses and instruction decoding can be eradicated. The experiments on a set of embedded benchmarks show that our proposed DLIC scheme can reduce energy consumption by up to 87% as compared to normal cache-only design. On average, 66% energy can be saved on instruction fetching and decoding, while at a performance overhead of only 1.4%.