The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor
Digital Technical Journal
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Filtering Memory References to Increase Energy Efficiency
IEEE Transactions on Computers
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Using SimPoint for accurate and efficient simulation
SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Power-Aware Branch Prediction: Characterization and Design
IEEE Transactions on Computers
Single-vDD and single-vT super-drowsy techniques for low-leakage high-performance instruction caches
Proceedings of the 2004 international symposium on Low power electronics and design
Guaranteeing Hits to Improve the Efficiency of a Small Instruction Cache
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Instruction cache energy saving through compiler way-placement
Proceedings of the conference on Design, automation and test in Europe
Hi-index | 0.01 |
Set-associative instruction caches achieve low miss rates at the expense of significant energy dissipation. Previous energy-efficient approaches usually suffer from performance degradation and redundant extension bits. In this paper, we propose a Way History Oriented Low Energy Instruction Cache (WHOLE-Cache) design for single issue and in-order execution processors. The WHOLE-Cache design not only achieves a significant portion of energy reduction by effectively reducing dynamic energy dissipation of set-associative instruction cache, but also leads to no additional cycle penalties. Tag comparison results are stored into either the Branch Target Buffer (BTB) or the Instruction Cache (I-Cache) to avoid tag checks and unnecessary way activation for subsequent accesses to visited cache lines. The extended BTB uses way history bits for branch instructions, while the I-Cache extension bits are used in case of fetching consecutive instructions resided in different cache lines. A valid flag is associated with each stored tag comparison result to indicate whether the instruction to be fetched is resided in the recorded location. A simple invalidation scheme is implemented in the cache miss replacement operation. Whenever a cache line is replaced, the pointers to it, which reside in the BTB or other I-cache lines, will be invalidated accordingly. We model the WHOLE-Cache design in Verilog. By deriving basic parameters from TSMC 65nm technology, we use Wattch simulator to evaluate the performance and energy reduction of the WHOLE-Cache in the instruction fetch stage. We use SPEC2000 and Mediabench as benchmarks. It is observed that compared with a conventional 4-way set-associative I-Cache, the energy consumption of the WHOLE-Cache is reduced by 65% without any performance penalty.