A way-halting cache for low-energy high-performance systems

Authors:
Chuanjun Zhang;Frank Vahid;Jun Yang;Walid Najjar
Affiliations:
San Diego State University, San Diego, CA;University of California, Riverside;University of California, Riverside;University of California, Riverside
Venue:
ACM Transactions on Architecture and Code Optimization (TACO)
Year:
2005

Citing 24
Cited 15

Cache designs with partial address matching

MICRO 27 Proceedings of the 27th annual international symposium on Microarchitecture
Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor

Digital Technical Journal - Special 10th anniversary issue
Reducing the frequency of tag compares for low power I-cache design

ISLPED '95 Proceedings of the 1995 international symposium on Low power design
The difference-bit cache

ISCA '96 Proceedings of the 23rd annual international symposium on Computer architecture
MediaBench: a tool for evaluating and synthesizing multimedia and communicatons systems

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
Way-predicting set-associative cache for high performance and low energy consumption

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
The TLB slice—a low-cost high-speed address translation mechanism

ISCA '90 Proceedings of the 17th annual international symposium on Computer Architecture
A low power unified cache architecture providing power and performance flexibility (poster session)

ISLPED '00 Proceedings of the 2000 international symposium on Low power electronics and design
L1 data cache decomposition for energy efficiency

ISLPED '01 Proceedings of the 2001 international symposium on Low power electronics and design
Data cache energy minimizations through programmable tag size matching to the applications

Proceedings of the 14th international symposium on Systems synthesis
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
Direct addressed caches for reduced power consumption

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
An adaptive serial-parallel CAM architecture for low-power cache blocks

Proceedings of the 2002 international symposium on Low power electronics and design
Computer architecture: a quantitative approach

Computer architecture: a quantitative approach
Power Management in the Amulet Microprocessors

IEEE Design & Test
SH3: High Code Density, Low Power

IEEE Micro
Virtual-Address Caches Part 1: Problems and Solutions in Uniprocessors

IEEE Micro
Reactive-Associative Caches

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Energy efficient frequent value data cache design

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
The AMULET2e Cache System

ASYNC '96 Proceedings of the 2nd International Symposium on Advanced Research in Asynchronous Circuits and Systems
Predictive sequential associative cache

HPCA '96 Proceedings of the 2nd IEEE Symposium on High-Performance Computer Architecture
A highly configurable cache architecture for embedded systems

Proceedings of the 30th annual international symposium on Computer architecture
A way-halting cache for low-energy high-performance systems

Proceedings of the 2004 international symposium on Low power electronics and design
A highly configurable cache for low energy embedded systems

ACM Transactions on Embedded Computing Systems (TECS)

Balanced Cache: Reducing Conflict Misses of Direct-Mapped Caches

Proceedings of the 33rd annual international symposium on Computer Architecture
Reducing cache misses through programmable decoders

ACM Transactions on Architecture and Code Optimization (TACO)
Word-interleaved cache: an energy efficient data cache architecture

Proceedings of the 13th international symposium on Low power electronics and design
Optimizing CAM-based instruction cache designs for low-power embedded systems

Journal of Systems Architecture: the EUROMICRO Journal
Recruiting Decay for Dynamic Power Reduction in Set-Associative Caches

Transactions on High-Performance Embedded Architectures and Compilers II
Tolerating process variations in large, set-associative caches: The buddy cache

ACM Transactions on Architecture and Code Optimization (TACO)
Way guard: a segmented counting bloom filter approach to reducing energy for set-associative caches

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
Temperature reduction analysis in Sentry Tag cache systems

Proceedings of the 10th workshop on MEmory performance: DEaling with Applications, systems and architecture
Online cache state dumping for processor debug

Proceedings of the 46th Annual Design Automation Conference
Applying decay to reduce dynamic power in set-associative caches

HiPEAC'07 Proceedings of the 2nd international conference on High performance embedded architectures and compilers
Enabling large decoded instruction loop caching for energy-aware embedded processors

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Full Length Article: An on-chip instruction cache design with one-bit tag for low-power embedded systems

Microprocessors & Microsystems
Selective word reading for high performance and low power processor

Proceedings of the 2011 ACM Symposium on Research in Applied Computation
DLIC: Decoded loop instructions caching for energy-aware embedded processors

ACM Transactions on Embedded Computing Systems (TECS)
Designing a practical data filter cache to improve both energy efficiency and performance

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

Caches contribute to much of a microprocessor system's power and energy consumption. Numerous new cache architectures, such as phased, pseudo-set-associative, way predicting, reactive-associative, way-shutdown, way-concatenating, and highly-associative, are intended to reduce power and/or energy, but they all impose some performance overhead. We have developed a new cache architecture, called a way-halting cache, that reduces energy further than previously mentioned architectures, while imposing no performance overhead. Our way-halting cache is a four-way set-associative cache that stores the four lowest-order bits of all ways' tags into a fully associative memory, which we call the halt tag array. The lookup in the halt tag array is done in parallel with, and is no slower than, the set-index decoding. The halt tag array predetermines which tags cannot match due to their low-order 4 bits mismatching. Further accesses to ways with known mismatching tags are then halted, thus saving power. Our halt tag array has an additional feature of using static logic only, rather than dynamic logic used in highly associative caches, making our cache simpler to design with existing tools. We provide data from experiments on 29 benchmarks drawn from Powerstone, Mediabench, and Spec 2000, based on our layouts in 0.18 micron CMOS technology. On average, we obtained 55% savings of memory-access related energy over a conventional four-way set-associative cache. We show that savings are greater than previous methods, and nearly twice that of highly associative caches, while imposing no performance overhead and only 2% cache area overhead.