Decoupled sectored caches: conciliating low tag implementation cost
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
Trace cache: a low latency approach to high bandwidth instruction fetching
Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Confidence estimation for speculation control
Proceedings of the 25th annual international symposium on Computer architecture
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Using dynamic cache management techniques to reduce energy in a high-performance processor
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Micro-operation cache: a power aware frontend for the variable instruction length ISA
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Power reduction through work reuse
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
The Alpha 21264 Microprocessor
IEEE Micro
Filtering Techniques to Improve Trace-Cache Efficiency
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Effective Hardware-Based Two-Way Loop Cache for High Performance Low Power Processors
ICCD '00 Proceedings of the 2000 IEEE International Conference on Computer Design: VLSI in Computers & Processors
Data filter cache with word selection cache for low power embedded processor
Proceedings of the 2013 Research in Adaptive and Convergent Systems
Hi-index | 0.00 |
With advances in semiconductor technology, power management has increasingly become a very important design constraint in processor design. In embedded processors, instruction fetch and decode consume more than 40% of processor power. This calls for development of power minimization techniques for the fetch and decode stages of the processor pipeline. For this, filter cache has been proposed as an architectural extension for reducing the power consumption. A filter cache is placed between the CPU and the instruction cache (I-cache) to provide the instruction stream. A filter cache has the advantages of shorter access time and lower power consumption. However, the downside of a filter cache is a possible performance loss in case of cache misses. In this article, we present a novel technique---decode filter cache (DFC)---for minimizing power consumption with minimal performance impact. The DFC stores decoded instructions. Thus, a hit in the DFC eliminates instruction fetch and its subsequent decoding. The bypassing of both instruction fetch and decode reduces processor power. We present a runtime approach for predicting whether the next fetch source is present in the DFC. In case a miss is predicted, we reduce the miss penalty by accessing the I-cache directly. We propose to classify instructions as cacheable or noncacheable, depending on the decode width. For efficient use of the cache space, a sectored cache design is used for the DFC so that both cacheable and noncacheable instructions can coexist in the DFC sector. Experimental results show that the DFC reduces processor power by 34% on an average and our next fetch prediction mechanism reduces miss penalty by more than 91%.