A low power SRAM using auto-backgate-controlled MT-CMOS
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Analysis of dual-Vt SRAM cells with full-swing single-ended bit line sensing for on-chip cache
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Proceedings of the 2003 international symposium on Low power electronics and design
Exploiting program hotspots and code sequentiality for instruction cache leakage management
Proceedings of the 2003 international symposium on Low power electronics and design
Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Circuit and microarchitectural techniques for reducing cache leakage power
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
IATAC: a smart predictor to turn-off L2 cache lines
ACM Transactions on Architecture and Code Optimization (TACO)
Exploring the limits of leakage power reduction in caches
ACM Transactions on Architecture and Code Optimization (TACO)
Compiler Directed Early Register Release
Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
ACM Transactions on Architecture and Code Optimization (TACO)
A study of thread migration in temperature-constrained multicores
ACM Transactions on Architecture and Code Optimization (TACO)
Energy efficient near-threshold chip multi-processing
ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Reducing leakage in power-saving capable caches for embedded systems by using a filter cache
MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance
IEEE Transactions on Computers
Capturing and optimizing the interactions between prefetching and cache line turnoff
Microprocessors & Microsystems
Reconfigurable energy efficient near threshold cache architectures
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Exploring the limits of early register release: Exploiting compiler analysis
ACM Transactions on Architecture and Code Optimization (TACO)
Selective wordline voltage boosting for caches to manage yield under process variations
Proceedings of the 46th Annual Design Automation Conference
Low Vccmin fault-tolerant cache with highly predictable performance
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Cache partitioning for energy-efficient and interference-free embedded multitasking
ACM Transactions on Embedded Computing Systems (TECS)
WHOLE: a low energy I-cache with separate way history
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
Two fast methods for estimating the minimum standby supply voltage for large SRAMs
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
An enhanced canary-based system with BIST for SRAM standby power reduction
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Using branch prediction information for near-optimal i-cache leakage
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Toward application-specific memory reconfiguration for energy efficiency
E2SC '13 Proceedings of the 1st International Workshop on Energy Efficient Supercomputing
Hi-index | 0.00 |
In this paper, we present a circuit technique that supports a super-drowsy mode with a single-V DD . In addition, we perform a detailed working set analysis for various cache line update policies for placing lines in a drowsy state. The analysis presents a policy for an instruction cache and shows it is as good as or better than more complex schemes proposed in the past. Furthermore, as an alternative to using high-threshold devices to reduce the bitline leakage through access transistors in drowsy caches, we propose a gated bitline precharge technique. A single threshold process is now sufficient. The gated precharge employs a simple but effective predictor that almost completely hides any performance loss incurred by the transitions between sub-banks. A 64-entry predictor with 3 bits per entry reduces the run-time increase by 78%, which is as effective as previous proposals that used content addressable predictors with 40 bits per entry. Overall, the combination of the proposed techniques reduces the leakage power by 72% with negligible (0.4%) run-time increase.