A predictive decode filter cache for reducing power consumption in embedded processors

Authors:
Weiyu Tang;Arun Kejariwal;Alexander V. Veidenbaum;Alexandru Nicolau
Affiliations:
Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA;Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA;Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA;Center for Embedded Computer Systems, University of California, Irvine, Irvine, CA
Venue:
ACM Transactions on Design Automation of Electronic Systems (TODAES)
Year:
2007

Citing 14
Cited 1

Decoupled sectored caches: conciliating low tag implementation cost

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching

Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The filter cache: an energy efficient memory structure

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Confidence estimation for speculation control

Proceedings of the 25th annual international symposium on Computer architecture
Pipeline gating: speculation control for energy reduction

Proceedings of the 25th annual international symposium on Computer architecture
Using dynamic cache management techniques to reduce energy in a high-performance processor

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Micro-operation cache: a power aware frontend for the variable instruction length ISA

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Power reduction through work reuse

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
UltraSPARC-IIi: Expanding the Boundaries of a System on a Chip

IEEE Micro
The Alpha 21264 Microprocessor

IEEE Micro
Filtering Techniques to Improve Trace-Cache Efficiency

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Effective Hardware-Based Two-Way Loop Cache for High Performance Low Power Processors

ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors

Data filter cache with word selection cache for low power embedded processor

Proceedings of the 2013 Research in Adaptive and Convergent Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

With advances in semiconductor technology, power management has increasingly become a very important design constraint in processor design. In embedded processors, instruction fetch and decode consume more than 40% of processor power. This calls for development of power minimization techniques for the fetch and decode stages of the processor pipeline. For this, filter cache has been proposed as an architectural extension for reducing the power consumption. A filter cache is placed between the CPU and the instruction cache (I-cache) to provide the instruction stream. A filter cache has the advantages of shorter access time and lower power consumption. However, the downside of a filter cache is a possible performance loss in case of cache misses. In this article, we present a novel technique---decode filter cache (DFC)---for minimizing power consumption with minimal performance impact. The DFC stores decoded instructions. Thus, a hit in the DFC eliminates instruction fetch and its subsequent decoding. The bypassing of both instruction fetch and decode reduces processor power. We present a runtime approach for predicting whether the next fetch source is present in the DFC. In case a miss is predicted, we reduce the miss penalty by accessing the I-cache directly. We propose to classify instructions as cacheable or noncacheable, depending on the decode width. For efficient use of the cache space, a sectored cache design is used for the DFC so that both cacheable and noncacheable instructions can coexist in the DFC sector. Experimental results show that the DFC reduces processor power by 34% on an average and our next fetch prediction mechanism reduces miss penalty by more than 91%.