Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches

Authors:
Se-Hyun Yang;Babak Falsafi
Affiliations:
Computer Architecture Laboratory (CALCM), Carnegie Mellon University;Computer Architecture Laboratory (CALCM), Carnegie Mellon University
Venue:
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Year:
2003

Citing 16
Cited 4

Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor

Digital Technical Journal - Special 10th anniversary issue
A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor

Digital Technical Journal
Way-predicting set-associative cache for high performance and low energy consumption

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations

Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing

Proceedings of the 27th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic fine-grain leakage reduction using leakage-biased bitlines

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design of High-Performance Microprocessor Circuits

Design of High-Performance Microprocessor Circuits
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor

IEEE Micro
Design Challenges of Technology Scaling

IEEE Micro
Drowsy instruction caches: leakage power reduction using dynamic voltage scaling and cache sub-bank prediction

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
An Integrated Circuit/Architecture Approach to Reducing Leakage in Deep-Submicron High-Performance I-Caches

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture

Single-vDD and single-vT super-drowsy techniques for low-leakage high-performance instruction caches

Proceedings of the 2004 international symposium on Low power electronics and design
On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance

IEEE Transactions on Computers
Segmented bitline cache: exploiting non-uniform memory access patterns

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Using branch prediction information for near-optimal i-cache leakage

ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture

Quantified Score

Hi-index	0.02

Visualization

Abstract

High-performance caches statically pull up the bit-linesin all cache subarrays to optimize cache accesslatency. Unfortunately, such an architecture results in asignificant waste of energy in nanoscale CMOS implementationsdue to high leakage and bitline discharge inthe unaccessed subarrays. Recent research advocatesbitline isolation to control precharging of individualsubarrays using bitline precharge devices. In this paper,we carefully evaluate the energy and performancetrade-offs of bitline isolation, and propose a techniqueto exploit nearly its full potential to eliminate dischargeand reduce overall energy in level-one caches.Cycle-accurate and circuit simulation results of awide-issue superscalar processor indicate that: (1) infuture CMOS technologies (e.g., 70nm and beyond),cache architectures that exploit bitline isolation caneliminate up to 90% of the bitline discharge, (2) on-demandprecharging (i.e., decoding the address andsubsequently precharging the accessed subarrays) is notviable in level-one caches because prechargingincreases the cache access latency, and (3) our proposalfor gated precharging to exploit subarray referencelocality and precharging only the recently accessed sub-arrayseliminates nearly all of bitline discharge innanoscale CMOS caches with only a 1% of performancedegradation.