Inexpensive implementations of set-associativity
ISCA '89 Proceedings of the 16th annual international symposium on Computer architecture
Circuit implementation of a 300-MHz 64-bit second-generation CMOS Alpha CPU
Digital Technical Journal - Special 10th anniversary issue
Energy optimization of multi-level processor cache architectures
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Reducing the frequency of tag compares for low power I-cache design
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
CAT—caching address tags: a technique for reducing area cost of on-chip caches
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Increasing cache port efficiency for dynamic superscalar microprocessors
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Region-based caching: an energy-delay efficient memory architecture for embedded processors
CASES '00 Proceedings of the 2000 international conference on Compilers, architecture, and synthesis for embedded systems
L1 data cache decomposition for energy efficiency
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Tarantula: a vector extension to the alpha architecture
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Cache designs for energy efficiency
HICSS '95 Proceedings of the 28th Hawaii International Conference on System Sciences
Design and analysis of low-power cache using two-level filter scheme
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Hi-index | 0.00 |
We revisit the idea of using small line buffers in-front of caches. We propose ReCast, a tiny tag set cache that filters a significant number of tag probes to the L2 tag array thus reducing power. The key contribution in ReCast is S-Shift, a simple indexing function (no logic involved just wires) that greatly improves the utility of line buffers with no additional hardware cost. S-Shift can be viewed as a technique for emulating larger cache blocks and hence exploiting more spatial locality but without paying the penalties of actually using a larger L2 cache block. Using several SPEC CPU2000 applications and a model of an aggressive, dynamicallyscheduled, superscalar processor we demonstrate that a practical ReCast organization can significantly reduce power in the L2. Specifically, a 64-entry ReCast comprising eight sub-banks of eight entries each can filter about 50% of all tag probes for a 1Mbyte L2 cache. A conventional line buffer of the same size filters only 32% of all tag probes. The resulting average reduction in L2 tag power is 38% and 85% with writeback or writethrough L1 caches respectively. This translates to a reduction of 16%or 52% of the overall L2 power respectively. We also analyze a few representative applications explaining why S-Shift works well.