On high-bandwidth data cache design for multi-issue processors
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
A low power SRAM using auto-backgate-controlled MT-CMOS
ISLPED '98 Proceedings of the 1998 international symposium on Low power electronics and design
Gradient-based optimization of custom circuits using a static-timing formulation
Proceedings of the 36th annual ACM/IEEE Design Automation Conference
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
ICCAD '99 Proceedings of the 1999 IEEE/ACM international conference on Computer-aided design
Focusing processor policies via critical-path prediction
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
DRG-cache: a data retention gated-ground cache for low power
Proceedings of the 39th annual Design Automation Conference
Dynamic fine-grain leakage reduction using leakage-biased bitlines
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power
ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Reducing set-associative cache energy via way-prediction and selective direct-mapping
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Lancelot: A FORTRAN Package for Large-Scale Nonlinear Optimization (Release A)
Lancelot: A FORTRAN Package for Large-Scale Nonlinear Optimization (Release A)
The Non-Critical Buffer: Using Load Latency Tolerance to Improve Data Cache Efficiency
ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Stack Value File: Custom Microarchitecture for the Stack
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Dynamic Prediction of Critical Path Instructions
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Static Energy Reduction Techniques for Microprocessor Caches
ICCD '01 Proceedings of the International Conference on Computer Design: VLSI in Computers & Processors
Power Efficient Data Cache Designs
ICCD '03 Proceedings of the 21st International Conference on Computer Design
A leakage-energy-reduction technique for cache memories in embedded processors
Journal of Embedded Computing - Embeded Processors and Systems: Architectural Issues and Solutions for Emerging Applications
Reconfigurable energy efficient near threshold cache architectures
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
As technology scales and processor speeds improve, power has become a first-order design constraint in all aspects of processor design. In this paper, we explore the use of criticality metrics to reduce dynamic and leakage energy within data caches. We leverage the ability to predict whether an access is in the application’s critical path to partition the accesses into multiple streams. Accesses in the critical path are serviced by a high-performance (hot) cache bank. Accesses not in the critical path are serviced by a lower energy (and lower performance (cold)) cache bank. The resulting organization is a physically banked cache with different levels of energy consumption and performance in each bank. Our results demonstrate that such a classification of instructions and data across two streams can be achieved with high accuracy. Each additional cycle in the cold cache access time slows performance down by only 0.8%. However, such a partition can increase contention for cache banks and entail non-negligible hardware overhead. While prior research has effectively employed criticality metrics to reduce power in arithmetic units, our analysis shows that the success of these techniques are limited when applied to data caches.