Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits

Authors:
Soontae Kim;Jongmin Lee;Jesung Kim;Seokin Hong
Affiliations:
KAIST, Gwahangno Yuseong-gu, Daejeon Korea;KAIST, Gwahangno Yuseong-gu, Daejeon Korea;LG Electronics, Gasan-dong Geumchun-gu, Seoul, Korea;KAIST, Gwahangno Yuseong-gu, Daejeon Korea
Venue:
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2011

Citing 26
Cited 1

Internal organization of the Alpha 21164, a 300-MHz 64-bit quad-issue CMOS RISC microprocessor

Digital Technical Journal - Special 10th anniversary issue
Retrospective: lockup-free instruction fetch/prefetch cache organization

25 years of the international symposia on Computer architecture (selected papers)
Reducing power in superscalar processor caches using subbanking, multiple line buffers and bit-line segmentation

ISLPED '99 Proceedings of the 1999 international symposium on Low power electronics and design
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Dynamic zero compression for cache energy reduction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Frequent value compression in data caches

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
An on-chip cache compression technique to reduce decompression overhead and design complexity

Journal of Systems Architecture: the EUROMICRO Journal
Cache decay: exploiting generational behavior to reduce cache leakage power

ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Drowsy caches: simple techniques for reducing leakage power

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Energy efficient frequent value data cache design

Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Dynamically Exploiting Narrow Width Operands to Improve Processor Power and Performance

HPCA '99 Proceedings of the 5th International Symposium on High Performance Computer Architecture
Just Say No: Benefits of Early Cache Miss Determination

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Performance of Hardware Compressed Main Memory

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
A Unified Compressed Memory Hierarchy

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Robust Main-Memory Compression Scheme

Proceedings of the 32nd annual international symposium on Computer Architecture
Restrictive Compression Techniques to Increase Level 1 Cache Capacity

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
MiBench: A free, commercially representative embedded benchmark suite

WWC '01 Proceedings of the Workload Characterization, 2001. WWC-4. 2001 IEEE International Workshop
Computer Architecture, Fourth Edition: A Quantitative Approach

Computer Architecture, Fourth Edition: A Quantitative Approach
The case for compressed caching in virtual memory systems

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Compression in cache design

Proceedings of the 21st annual international conference on Supercomputing
Increasing cache capacity through word filtering

Proceedings of the 21st annual international conference on Supercomputing
Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Zero-content augmented caches

Proceedings of the 23rd international conference on Supercomputing
C-pack: a high-performance microprocessor cache compression algorithm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
SRAM leakage reduction by row/column redundancy under random within-die delay variation

IEEE Transactions on Very Large Scale Integration (VLSI) Systems

Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

L2 cache memories are being adopted in the embedded systems for high performance, which, however, increases energy consumption due to their large sizes. We propose a low-energy low-area L2 cache architecture, which performs as well as the conventional L2 cache architecture with 53% less area and around 40% less energy consumption. This architecture consists of an L2 cache and a small cache called residue cache. L2 and residue cache lines are half sized of the conventional L2 cache lines. Well compressed conventional L2 cache lines are stored only in the L2 cache while other poorly compressed lines are stored in both the L2 and residue caches. Although many conventional L2 cache lines are not fully captured by the residue cache, most accesses to them do not incur misses because not all their words are needed immediately, which are termed as partial hits in this paper. The residue cache architecture consumes much lower energy and area than conventional L2 cache architectures, and can be combined synergistically with other schemes such as the line distillation and ZCA. The residue cache architecture is also shown to perform well on a 4-way superscalar processor typically used in high performance systems.