Cache designs with partial address matching
MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor
Digital Technical Journal - Special 10th anniversary issue
Reducing the frequency of tag compares for low power I-cache design
ISLPED '95 Proceedings of the 1995 international symposium on Low power design
ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems
MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Way-predicting set-associative cache for high performance and low energy consumption
ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
The TLB slice—a low-cost high-speed address translation mechanism
ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A low power unified cache architecture providing power and performance flexibility (poster session)
ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
L1 data cache decomposition for energy efficiency
ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Data cache energy minimizations through programmable tag size matching to the applications
Proceedings of the 14th international symposium on Systems synthesis
Reducing set-associative cache energy via way-prediction and selective direct-mapping
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Direct addressed caches for reduced power consumption
Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An adaptive serial-parallel CAM architecture for low-power cache blocks
Proceedings of the 2002 international symposium on Low power electronics and design
Computer architecture: a quantitative approach
Computer architecture: a quantitative approach
Power Management in the Amulet Microprocessors
IEEE Design & Test
SH3: High Code Density, Low Power
IEEE Micro
Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Energy efficient frequent value data cache design
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
ASYNC '96 Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems
Predictive sequential associative cache
HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A highly configurable cache architecture for embedded systems
Proceedings of the 30th annual international symposium on Computer architecture
A way-halting cache for low-energy high-performance systems
Proceedings of the 2004 international symposium on Low power electronics and design
A highly configurable cache for low energy embedded systems
ACM Transactions on Embedded Computing Systems (TECS)
Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches
Proceedings of the 33rd annual international symposium on Computer Architecture
Reducing cache misses through programmable decoders
ACM Transactions on Architecture and Code Optimization (TACO)
Word-interleaved cache: an energy efficient data cache architecture
Proceedings of the 13th international symposium on Low power electronics and design
Optimizing CAM-based instruction cache designs for low-power embedded systems
Journal of Systems Architecture: the EUROMICRO Journal
Recruiting Decay for Dynamic Power Reduction in Set-Associative Caches
Transactions on High-Performance Embedded Architectures and Compilers II
Tolerating process variations in large, set-associative caches: The buddy cache
ACM Transactions on Architecture and Code Optimization (TACO)
Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches
Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Temperature reduction analysis in Sentry Tag cache systems
Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Online cache state dumping for processor debug
Proceedings of the 46th Annual Design Automation Conference
Applying decay to reduce dynamic power in set-associative caches
HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Enabling large decoded instruction loop caching for energy-aware embedded processors
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Microprocessors & Microsystems
Selective word reading for high performance and low power processor
Proceedings of the 2011 ACM Symposium on Research in Applied Computation
DLIC: Decoded loop instructions caching for energy-aware embedded processors
ACM Transactions on Embedded Computing Systems (TECS)
Designing a practical data filter cache to improve both energy efficiency and performance
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Caches contribute to much of a microprocessor system's power and energy consumption. Numerous new cache architectures, such as phased, pseudo-set-associative, way predicting, reactive-associative, way-shutdown, way-concatenating, and highly-associative, are intended to reduce power and/or energy, but they all impose some performance overhead. We have developed a new cache architecture, called a way-halting cache, that reduces energy further than previously mentioned architectures, while imposing no performance overhead. Our way-halting cache is a four-way set-associative cache that stores the four lowest-order bits of all ways' tags into a fully associative memory, which we call the halt tag array. The lookup in the halt tag array is done in parallel with, and is no slower than, the set-index decoding. The halt tag array predetermines which tags cannot match due to their low-order 4 bits mismatching. Further accesses to ways with known mismatching tags are then halted, thus saving power. Our halt tag array has an additional feature of using static logic only, rather than dynamic logic used in highly associative caches, making our cache simpler to design with existing tools. We provide data from experiments on 29 benchmarks drawn from Powerstone, Mediabench, and Spec 2000, based on our layouts in 0.18 micron CMOS technology. On average, we obtained 55% savings of memory-access related energy over a conventional four-way set-associative cache. We show that savings are greater than previous methods, and nearly twice that of highly associative caches, while imposing no performance overhead and only 2% cache area overhead.