Let caches decay: reducing leakage energy via exploitation of cache generational behavior

Authors:
Zhigang Hu;Stefanos Kaxiras;Margaret Martonosi
Affiliations:
Princeton University, Princeton, NJ;Agere Systems, Murray Hill, NJ;Princeton University, Princeton, NJ
Venue:
ACM Transactions on Computer Systems (TOCS)
Year:
2002

Citing 26
Cited 17

On the inclusion properties for multi-level cache hierarchies

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
A model for estimating trace-sample miss ratios

SIGMETRICS '91 Proceedings of the 1991 ACM SIGMETRICS conference on Measurement and modeling of computer systems
A comparison of dynamic branch predictors that use two levels of branch history

ISCA '93 Proceedings of the 20th annual international symposium on computer architecture
Circuit implementation of a 300-MHz 64-bit second-generation CMOS Alpha CPU

Digital Technical Journal - Special 10th anniversary issue
Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Reducing TLB and memory overhead using online superpage promotion

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Analytical energy dissipation models for low-power caches

ISLPED '97 Proceedings of the 1997 international symposium on Low power electronics and design
Designing high bandwidth on-chip caches

Proceedings of the 24th annual international symposium on Computer architecture
ProfileMe: hardware support for instruction-level profiling on out-of-order processors

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Estimation of standby leakage power in CMOS circuits considering accurate modeling of transistor stacks

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Empirical studies of competitve spinning for a shared-memory multiprocessor

SOSP '91 Proceedings of the thirteenth ACM symposium on Operating systems principles
Capturing dynamic memory reference behavior with adaptive cache topology

Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Selective, accurate, and timely self-invalidation using last-touch prediction

Proceedings of the 27th annual international symposium on Computer architecture
Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
Near-Optimal Parallel Prefetching and Caching

SIAM Journal on Computing
Eager writeback - a technique for improving bandwidth utilization

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
A static power model for architects

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Performance analysis using the MIPS R10000 performance counters

Supercomputing '96 Proceedings of the 1996 ACM/IEEE conference on Supercomputing
Power aware page allocation

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Design Challenges of Technology Scaling

IEEE Micro
An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
DRAM Energy Management Using Sof ware and Hardware Directed Power Mode Control

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture

Leakage aware dynamic voltage scaling for real-time embedded systems

Proceedings of the 41st annual Design Automation Conference
Implementing branch-predictor decay using quasi-static memory cells

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic power management for streaming data

Proceedings of the 2004 international symposium on Low power electronics and design
Location cache: a low-power L2 cache system

Proceedings of the 2004 international symposium on Low power electronics and design
Joint Power Management of Memory and Disk

Proceedings of the conference on Design, Automation and Test in Europe - Volume 1
IATAC: a smart predictor to turn-off L2 cache lines

ACM Transactions on Architecture and Code Optimization (TACO)
Computer Architecture: Challenges and Opportunities for the Next Decade

IEEE Micro
Exploring the limits of leakage power reduction in caches

ACM Transactions on Architecture and Code Optimization (TACO)
Statistically Optimal Dynamic Power Management for Streaming Data

IEEE Transactions on Computers
Reducing non-deterministic loads in low-power caches via early cache set resolution

Microprocessors & Microsystems
Improving power efficiency of D-NUCA caches

ACM SIGARCH Computer Architecture News
Way adaptable D-NUCA caches

International Journal of High Performance Systems Architecture
Limiting the number of dirty cache lines

Proceedings of the Conference on Design, Automation and Test in Europe
Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches

ACM Transactions on Architecture and Code Optimization (TACO)
Design space exploration of FinFET cache

ACM Journal on Emerging Technologies in Computing Systems (JETC)
Low-energy volatile STT-RAM cache design using cache-coherence-enabled adaptive refresh

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Toward application-specific memory reconfiguration for energy efficiency

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Power dissipation is increasingly important in CPUs ranging from those intended for mobile use, all the way up to high-performance processors for highend servers. Although the bulk of the power dissipated is dynamic switching power, leakage power is also beginning to be a concern. Chipmakers expect that in future chip generations, leakage's proportion of total chip power will increase significantly. This article examines methods for reducing leakage power within the cache memories of the CPU. Because caches comprise much of a CPU chip's area and transistor counts, they are reasonable targets for attacking leakage. We discuss policies and implementations for reducing cache leakage by invalidating and "turning off" cache lines when they hold data not likely to be reused. In particular, our approach is targeted at the generational nature of cache line usage. That is, cache lines typically have a flurry of frequent use when first brought into the cache, and then have a period of "dead time" before they are evicted. By devising effective, low-power ways of deducing dead time, our results show that in many cases we can reduce L1 cache leakage energy by 4x in SPEC2000 applications without having an impact on performance. Because our decay-based techniques have notions of competitive online algorithms at their roots, their energy usage can be theoretically bounded at within a factor of two of the optimal oracle-based policy. We also examine adaptive decay-based policies that make energy-minimizing policy choices on a per-application basis by choosing appropriate decay intervals individually for each cache line. Our proposed adaptive policies effectively reduce L1 cache leakage energy by 5x for the SPEC2000 with only negligible degradations in performance.