Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor
Digital Technical Journal - Special 10th anniversary issue
A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor
Digital Technical Journal
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Reconfigurable caches and their application to media processing
Proceedings of the 27th annual international symposium on Computer architecture
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Dynamic fine-grain leakage reduction using leakage-biased bitlines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Design of High-Performance Microprocessor Circuits
Design of High-Performance Microprocessor Circuits
Reducing set-associative cache energy via way-prediction and selective direct-mapping
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
The MIPS R10000 Superscalar Microprocessor
IEEE Micro
Design Challenges of Technology Scaling
IEEE Micro
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Single-vDD and single-vT super-drowsy techniques for low-leakage high-performance instruction caches
Proceedings of the 2004 international symposium on Low power electronics and design
On-Demand Solution to Minimize I-Cache Leakage Energy with Maintaining Performance
IEEE Transactions on Computers
Segmented bitline cache: exploiting non-uniform memory access patterns
HiPC'06 Proceedings of the 13th international conference on High Performance Computing
Using branch prediction information for near-optimal i-cache leakage
ACSAC'06 Proceedings of the 11th Asia-Pacific conference on Advances in Computer Systems Architecture
Hi-index | 0.02 |
High-performance caches statically pull up the bit-linesin all cache subarrays to optimize cache accesslatency. Unfortunately, such an architecture results in asignificant waste of energy in nanoscale CMOS implementationsdue to high leakage and bitline discharge inthe unaccessed subarrays. Recent research advocatesbitline isolation to control precharging of individualsubarrays using bitline precharge devices. In this paper,we carefully evaluate the energy and performancetrade-offs of bitline isolation, and propose a techniqueto exploit nearly its full potential to eliminate dischargeand reduce overall energy in level-one caches.Cycle-accurate and circuit simulation results of awide-issue superscalar processor indicate that: (1) infuture CMOS technologies (e.g., 70nm and beyond),cache architectures that exploit bitline isolation caneliminate up to 90% of the bitline discharge, (2) on-demandprecharging (i.e., decoding the address andsubsequently precharging the accessed subarrays) is notviable in level-one caches because prechargingincreases the cache access latency, and (3) our proposalfor gated precharging to exploit subarray referencelocality and precharging only the recently accessed sub-arrayseliminates nearly all of bitline discharge innanoscale CMOS caches with only a 1% of performancedegradation.