Zero-content augmented caches

Authors:
Julien Dusser;Thomas Piquet;André Seznec
Affiliations:
INRIA, Rennes, France;INRIA, Rennes, France;INRIA, Rennes, France
Venue:
Proceedings of the 23rd international conference on Supercomputing
Year:
2009

Citing 20
Cited 9

Decoupled sectored caches: conciliating low tag implementation cost

ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Silent stores for free

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Dynamic zero compression for cache energy reduction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Frequent value compression in data caches

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
An on-chip cache compression technique to reduce decompression overhead and design complexity

Journal of Systems Architecture: the EUROMICRO Journal
Symbiotic jobscheduling with priorities for a simultaneous multithreading processor

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Frequent value locality and its applications

ACM Transactions on Embedded Computing Systems (TECS)
Improving System Performance with Compressed Memory

IPDPS '01 Proceedings of the 15th International Parallel & Distributed Processing Symposium
Performance of Hardware Compressed Main Memory

HPCA '01 Proceedings of the 7th International Symposium on High-Performance Computer Architecture
Adaptive Compressed Caching: Design and Implementation

SBAC-PAD '03 Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
A Unified Compressed Memory Hierarchy

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Robust Main-Memory Compression Scheme

Proceedings of the 32nd annual international symposium on Computer Architecture
RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence

Proceedings of the 32nd annual international symposium on Computer Architecture
Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking

Proceedings of the 32nd annual international symposium on Computer Architecture
Analysis of the O-GEometric History Length Branch Predictor

Proceedings of the 32nd annual international symposium on Computer Architecture
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
The case for compressed caching in virtual memory systems

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
A Framework for Coarse-Grain Optimizations in the On-Chip Memory Hierarchy

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture

Decoupled zero-compressed memory

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
A unified approach to eliminate memory accesses early

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
HICAMP: architectural support for efficient concurrency-safe shared structured data access

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Base-delta-immediate compression: practical data compression for on-chip caches

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Linearly compressed pages: a main memory compression framework with low complexity and low latency

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Data filter cache with word selection cache for low power embedded processor

Proceedings of the 2013 Research in Adaptive and Convergent Systems
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Linearly compressed pages: a low-complexity, low-latency main memory compression framework

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

It has been observed that some applications manipulate large amounts of null data. Moreover these zero data often exhibit high spatial locality. On some applications more than 20% of the data accesses concern null data blocks. Representing a null block in a cache on a standard cache line appears as a waste of resources. In this paper, we propose the Zero-Content Augmented cache, the ZCA cache. A ZCA cache consists of a conventional cache augmented with a specialized cache for memorizing null blocks, the Zero-Content cache or ZC cache. In the ZC cache, the data block is represented by its address tag and a validity bit. Moreover, as null blocks generally exhibit high spatial locality, several null blocks can be associated with a single address tag in the ZC cache. For instance, a ZC cache mapping 32MB of zero 64-byte lines uses less than 80KB of storage. Decompression of a null block is very simple, therefore read access time on the ZCA cache is in the same range as the one of a conventional cache. On applications manipulating large amount of null data blocks, such a ZC cache allows to significantly reduce the miss rate and memory traffic, and therefore to increase performance for a small hardware overhead. In particular, the write-back traffic on null blocks is limited. For applications with a low null block rate, no performance loss is observed.