Single-vDD and single-vT super-drowsy techniques for low-leakage high-performance instruction caches

Authors:
Nam Sung Kim;Krisztián Flautner;David Blaauw;Trevor Mudge
Affiliations:
Intel Corp.;ARM Ltd.;University of Michigan;University of Michigan
Venue:
Proceedings of the 2004 international symposium on Low power electronics and design
Year:
2004

Citing 7
Cited 21

A low power SRAM using auto-backgate-controlled MT-CMOS

ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Analysis of dual-Vt SRAM cells with full-swing single-ended bit line sensing for on-chip cache

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Optimal body bias selection for leakage improvement and process compensation over different technology generations

Proceedings of the 2003 international symposium on Low power electronics and design
Exploiting program hotspots and code sequentiality for instruction cache leakage management

Proceedings of the 2003 international symposium on Low power electronics and design
Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Circuit and microarchitectural techniques for reducing cache leakage power

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

IATAC: a smart predictor to turn-off L2 cache lines

ACM Transactions on Architecture and Code Optimization (TACO)
Exploring the limits of leakage power reduction in caches

ACM Transactions on Architecture and Code Optimization (TACO)
Compiler Directed Early Register Release

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Snug set-associative caches: Reducing leakage power of instruction and data caches with no performance penalties

ACM Transactions on Architecture and Code Optimization (TACO)
A study of thread migration in temperature-constrained multicores

ACM Transactions on Architecture and Code Optimization (TACO)
Energy efficient near-threshold chip multi-processing

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Reducing leakage in power-saving capable caches for embedded systems by using a filter cache

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance

IEEE Transactions on Computers
Power considerations in banked CAMs: a leakage reduction approach

VLSI Design
Capturing and optimizing the interactions between prefetching and cache line turnoff

Microprocessors & Microsystems
Reconfigurable energy efficient near threshold cache architectures

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Exploring the limits of early register release: Exploiting compiler analysis

ACM Transactions on Architecture and Code Optimization (TACO)
Selective wordline voltage boosting for caches to manage yield under process variations

Proceedings of the 46th Annual Design Automation Conference
Low Vccmin fault-tolerant cache with highly predictable performance

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Cache partitioning for energy-efficient and interference-free embedded multitasking

ACM Transactions on Embedded Computing Systems (TECS)
WHOLE: a low energy I-cache with separate way history

ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Two fast methods for estimating the minimum standby supply voltage for large SRAMs

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
An enhanced canary-based system with BIST for SRAM standby power reduction

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Using branch prediction information for near-optimal i-cache leakage

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Dynamic last-level cache allocation to reduce area and power overhead in directory coherence protocols

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Toward application-specific memory reconfiguration for energy efficiency

E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

In this paper, we present a circuit technique that supports a super-drowsy mode with a single-V DD . In addition, we perform a detailed working set analysis for various cache line update policies for placing lines in a drowsy state. The analysis presents a policy for an instruction cache and shows it is as good as or better than more complex schemes proposed in the past. Furthermore, as an alternative to using high-threshold devices to reduce the bitline leakage through access transistors in drowsy caches, we propose a gated bitline precharge technique. A single threshold process is now sufficient. The gated precharge employs a simple but effective predictor that almost completely hides any performance loss incurred by the transitions between sub-banks. A 64-entry predictor with 3 bits per entry reduces the run-time increase by 78%, which is as effective as previous proposals that used content addressable predictors with 40 bits per entry. Overall, the combination of the proposed techniques reduces the leakage power by 72% with negligible (0.4%) run-time increase.