Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor
Digital Technical Journal - Special 10th anniversary issue
Cache design trade-offs for power and performance optimization: a case study
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
Next cache line and set prediction
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Zero-cycle loads: microarchitecture support for reducing load latency
Proceedings of the 28th annual international symposium on Microarchitecture
Predictability of load/store instruction latencies
MICRO 26 Proceedings of the 26th annual international symposium on Microarchitecture
The filter cache: an energy efficient memory structure
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Power considerations in the design of the Alpha 21264 microprocessor
DAC '98 Proceedings of the 35th annual Design Automation Conference
Pipeline gating: speculation control for energy reduction
Proceedings of the 25th annual international symposium on Computer architecture
Using dynamic cache management techniques to reduce energy in a high-performance processor
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
Selective cache ways: on-demand cache resource allocation
Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Wattch: a framework for architectural-level power analysis and optimizations
Proceedings of the 27th annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Predictive sequential associative cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Partitioned instruction cache architecture for energy efficiency
ACM Transactions on Embedded Computing Systems (TECS)
Integrating Adaptive On-Chip Storage Structures for Reduced Dynamic Power
Proceedings of the 2002 International Conference on Parallel Architectures and Compilation Techniques
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Deterministic Clock Gating for Microprocessor Power Reduction
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Just Say No: Benefits of Early Cache Miss Determination
HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
On load latency in low-power caches
Proceedings of the 2003 international symposium on Low power electronics and design
Low cost instruction cache designs for tag comparison elimination
Proceedings of the 2003 international symposium on Low power electronics and design
Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Near-Optimal Precharging in High-Performance Nanoscale CMOS Caches
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Dynamic Partitioning of Shared Cache Memory
The Journal of Supercomputing
An energy efficient cache memory architecture for embedded systems
Proceedings of the 2004 ACM symposium on Applied computing
A Self-Tuning Cache Architecture for Embedded Systems
Proceedings of the conference on Design, automation and test in Europe - Volume 1
Coupling compiler-enabled and conventional memory accessing for energy efficiency
ACM Transactions on Computer Systems (TOCS)
A self-tuning cache architecture for embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Location cache: a low-power L2 cache system
Proceedings of the 2004 international symposium on Low power electronics and design
A way-halting cache for low-energy high-performance systems
Proceedings of the 2004 international symposium on Low power electronics and design
Dynamically Trading Frequency for Complexity in a GALS Microprocessor
Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Using a serial cache for energy efficient instruction fetching
Journal of Systems Architecture: the EUROMICRO Journal
Scalable cache memory design for large-scale SMT architectures
WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
Energy-security tradeoff in a secure cache architecture against buffer overflow attacks
ACM SIGARCH Computer Architecture News - Special issue: Workshop on architectural support for security and anti-virus (WASSA)
Clustered Loop Buffer Organization for Low Energy VLIW Embedded Processors
IEEE Transactions on Computers
A way-halting cache for low-energy high-performance systems
ACM Transactions on Architecture and Code Optimization (TACO)
IATAC: a smart predictor to turn-off L2 cache lines
ACM Transactions on Architecture and Code Optimization (TACO)
Skewed caches from a low-power perspective
Proceedings of the 2nd conference on Computing frontiers
A highly configurable cache for low energy embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Energy-efficient and high-performance instruction fetch using a block-aware ISA
ISLPED '05 Proceedings of the 2005 international symposium on Low power electronics and design
Reducing latencies of pipelined cache accesses through set prediction
Proceedings of the 19th annual international conference on Supercomputing
RECAST: Boosting Tag Line Buffer Coverage in Low-Power High-Level Caches "for Free"
ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Cache size selection for performance, energy and reliability of time-constrained systems
ASP-DAC '06 Proceedings of the 2006 Asia and South Pacific Design Automation Conference
Program Counter-Based Prediction Techniques for Dynamic Power Management
IEEE Transactions on Computers
A low energy cache design for multimedia applications exploiting set access locality
Journal of Systems Architecture: the EUROMICRO Journal
Block-aware instruction set architecture
ACM Transactions on Architecture and Code Optimization (TACO)
Wide and efficient trace prediction using the local trace predictor
Proceedings of the 20th annual international conference on Supercomputing
ACM Transactions on Architecture and Code Optimization (TACO)
Reducing I-cache energy of multimedia applications through low cost tag comparison elimination
Journal of Embedded Computing - Cache exploitation in embedded systems
A cache design for high performance embedded systems
Journal of Embedded Computing - Cache exploitation in embedded systems
Unified microprocessor core storage
Proceedings of the 4th international conference on Computing frontiers
Reducing non-deterministic loads in low-power caches via early cache set resolution
Microprocessors & Microsystems
Program-counter-based pattern classification in buffer caching
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
Working with process variation aware caches
Proceedings of the conference on Design, automation and test in Europe
A low power front-end for embedded processors using a block-aware instruction set
CASES '07 Proceedings of the 2007 international conference on Compilers, architecture, and synthesis for embedded systems
Block remap with turnoff: a variation-tolerant cache design technique
Proceedings of the 2008 Asia and South Pacific Design Automation Conference
Capturing and optimizing the interactions between prefetching and cache line turnoff
Microprocessors & Microsystems
Optimizing CAM-based instruction cache designs for low-power embedded systems
Journal of Systems Architecture: the EUROMICRO Journal
A novel cache architecture with enhanced performance and security
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Reconfigurable energy efficient near threshold cache architectures
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Recruiting Decay for Dynamic Power Reduction in Set-Associative Caches
Transactions on High-Performance Embedded Architectures and Compilers II
Tolerating process variations in large, set-associative caches: The buddy cache
ACM Transactions on Architecture and Code Optimization (TACO)
An energy-delay efficient 2-level data cache architecture for embedded system
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
The Design and Evaluation of a Selective Way Based Trace Cache
APPT '09 Proceedings of the 8th International Symposium on Advanced Parallel Processing Technologies
Reducing peak power with a table-driven adaptive processor core
Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Applying decay to reduce dynamic power in set-associative caches
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Dynamic capacity-speed tradeoffs in SMT processor caches
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
WHOLE: a low energy I-cache with separate way history
ICCD'09 Proceedings of the 2009 IEEE international conference on Computer design
SAMOS'09 Proceedings of the 9th international conference on Systems, architectures, modeling and simulation
Dynamically reconfigurable cache architecture using adaptive block allocation policy
IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
DCG: deterministic clock-gating for low-power microprocessor design
IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on the 2002 international symposium on low-power electronics and design (ISLPED)
Microprocessors & Microsystems
Simulating a LAGS processor to consider variable latency on L1 D-Cache
Proceedings of the 2010 Summer Computer Simulation Conference
A phase adaptive cache hierarchy for SMT processors
Microprocessors & Microsystems
Embedded Systems Design
First-level instruction cache design for reducing dynamic energy consumption
SAMOS'05 Proceedings of the 5th international conference on Embedded Computer Systems: architectures, Modeling, and Simulation
Exploring the potential of architecture-level power optimizations
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Hot-and-Cold: using criticality in the design of energy-efficient caches
PACS'03 Proceedings of the Third international conference on Power - Aware Computer Systems
Energy-Effective instruction fetch unit for wide issue processors
ACSAC'05 Proceedings of the 10th Asia-Pacific conference on Advances in Computer Systems Architecture
Link-time optimization for power efficiency in a tagless instruction cache
CGO '11 Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization
Reducing L1 caches power by exploiting software semantics
Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
A tagless cache design for power saving in embedded systems
The Journal of Supercomputing
An integrated pseudo-associativity and relaxed-order approach to hardware transactional memory
ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
MALEC: a multiple access low energy cache
Proceedings of the Conference on Design, Automation and Test in Europe
Data filter cache with word selection cache for low power embedded processor
Proceedings of the 2013 Research in Adaptive and Convergent Systems
A Buffered Dual-Access-Mode Scheme Designed for Low-Power Highly-Associative Caches
International Journal of Embedded and Real-Time Communication Systems
TLC: a tag-less cache for reducing dynamic first level cache energy
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Designing a practical data filter cache to improve both energy efficiency and performance
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.01 |
Set-associative caches achieve low miss rates for typical applications but result in significant energy dissipation. Set-associative caches minimize access time by probing all the data ways in parallel with the tag lookup, although the output of only the matching way is used. The energy spent accessing the other ways is wasted. Eliminating the wasted energy by performing the data lookup sequentially following the tag lookup substantially increases cache access time, and is unacceptable for high-performance L1 caches. In this paper, we apply two previously-proposed techniques, way-prediction and selective direct-mapping, to reducing L1 cache dynamic energy while maintaining high performance. The techniques predict the matching way and probe only the predicted way and not all the ways, achieving energy savings. While these techniques were originally proposed to improve set-associative cache access times, this is the first paper to apply them to reducing cache energy.We evaluate the effectiveness of these techniques in reducing L1 d-cache, L1 i-cache, and overall processor energy. Using these techniques, our caches achieve the energy-delay of sequential access while maintaining the performance of parallel access. Relative to parallel access L1 i- and d-caches, the techniques achieve overall processor energy-delay reduction of 8%, while perfect way-prediction with no performance degradation achieves 10% reduction. The performance degradation of the techniques is less than 3%, compared to an aggressive, 1-cycle, 4-way, parallel access cache.