Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties

Authors:
Yuan-Shin Hwang;Jia-Jhe Li
Affiliations:
National Taiwan Ocean University, Keelung, Taiwan;National Tsing Hua University, Hsinchu, Taiwan
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2007

Citing 29
Cited 1

Performance tradeoffs in cache design

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
A static power model for architects

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
SimpleScalar: An Infrastructure for Computer System Modeling

Computer
The Alpha 21264 Microprocessor

IEEE Micro
Design Challenges of Technology Scaling

IEEE Micro
Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
A highly configurable cache architecture for embedded systems

Proceedings of the 30th annual international symposium on Computer architecture
Adaptive mode control: A static-power-efficient cache design

ACM Transactions on Embedded Computing Systems (TECS)
Computer Architecture: A Quantitative Approach

Computer Architecture: A Quantitative Approach
Reducing data cache energy consumption via cached load/store queue

Proceedings of the 2003 international symposium on Low power electronics and design
Exploiting program hotspots and code sequentiality for instruction cache leakage management

Proceedings of the 2003 international symposium on Low power electronics and design
Static Energy Reduction Techniques for Microprocessor Caches

ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Reducing Design Complexity of the Load/Store Queue

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Circuit and microarchitectural techniques for reducing cache leakage power

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Single-vDD and single-vT super-drowsy techniques for low-leakage high-performance instruction caches

Proceedings of the 2004 international symposium on Low power electronics and design
HotSpot cache: joint temporal and spatial locality exploitation for i-cache energy reduction

Proceedings of the 2004 international symposium on Low power electronics and design
A way-halting cache for low-energy high-performance systems

Proceedings of the 2004 international symposium on Low power electronics and design
Soft error and energy consumption interactions: a data cache perspective

Proceedings of the 2004 international symposium on Low power electronics and design
Static next sub-bank prediction for drowsy instruction cache

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
On the Limits of Leakage Power Reduction in Caches

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
Signature Buffer: Bridging Performance Gap between Registers and Caches

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A simple mechanism to adapt leakage-control policies to temperature

ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Structured Computer Organization (5th Edition)

Structured Computer Organization (5th Edition)

On reducing load/store latencies of cache accesses

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

As transistors keep shrinking and on-chip caches keep growing, static power dissipation resulting from leakage of caches takes an increasing fraction of total power in processors. Several techniques have already been proposed to reduce leakage power by turning off unused cache lines. However, they all have to pay the price of performance degradation. This paper presents a cache architecture, the snug set-associative (SSA) cache, that cuts most of static power dissipation of caches without incuring performance penalties. The SSA cache reduces leakage power by implementing the minimum set-associative scheme, which only activates the minimal numbers of ways in each cache set, while the performance losses caused by this scheme are compensated by the base-offset load/store queues. The rationale of combining these two techniques is locality: as the contents of the cache blocks in the current working set are repeatedly accessed, same addresses would be computed again and again. The SSA cache architecture can be applied to data and instruction caches to reduce leakage power without incurring performance penalties. Experimental results show that SSA can cut static power consumption of the L1 data cache by 93%, on average, for SPECint2000 benchmarks, while the execution times are reduced by 5%. Similarly, SSA can cut leakage dissipation of the L1 instruction cache by 92%, on average, and improve performance over 3%. Furthermore, when SSA is adopted for both L1 data and instruction caches, the normalized leakage of L1 data and instruction caches is lowered to 8%, on average, while still accomplishing a 2% reduction in execution times.