Base-delta-immediate compression: practical data compression for on-chip caches

Authors:
Gennady Pekhimenko;Vivek Seshadri;Onur Mutlu;Phillip B. Gibbons;Michael A. Kozuch;Todd C. Mowry
Affiliations:
Carnegie Mellon University, Pittsburgh, Pennsylvania, USA;Carnegie Mellon University, Pittsburgh, Pennsylvania, USA;Carnegie Mellon University, Pittsburgh, Pennsylvania, USA;Intel Labs Pittsburgh, Pittsburgh, Pennsylvania, USA;Intel Labs Pittsburgh, Pittsburgh, Pennsylvania, USA;Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Venue:
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Year:
2012

Citing 28
Cited 3

Dynamic base register caching: a technique for reducing address bus width

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The predictability of data values

MICRO 30 Proceedings of the 30th annual ACM/IEEE international symposium on Microarchitecture
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Dynamic zero compression for cache energy reduction

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Frequent value compression in data caches

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Frequent value locality and value-centric data cache design

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
Information content of CPU memory referencing behavior

ISCA '77 Proceedings of the 4th annual symposium on Computer architecture
Exploiting Value Locality in Physical Register Files

Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
A Unified Compressed Memory Hierarchy

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Robust Main-Memory Compression Scheme

Proceedings of the 32nd annual international symposium on Computer Architecture
The V-Way Cache: Demand Based Associativity via Global Replacement

Proceedings of the 32nd annual international symposium on Computer Architecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
The case for compressed caching in virtual memory systems

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
DHTC: an effective DXTC-based HDR texture compression scheme

Proceedings of the 23rd ACM SIGGRAPH/EUROGRAPHICS symposium on Graphics hardware
Zero-content augmented caches

Proceedings of the 23rd international conference on Supercomputing
Dynamic data compression in multi-hop wireless networks

Proceedings of the eleventh international joint conference on Measurement and modeling of computer systems
Zero-Value Caches: Cancelling Loads that Return Zero

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Memory Performance and Cache Coherency Effects on an Intel Nehalem Multiprocessor System

PACT '09 Proceedings of the 2009 18th International Conference on Parallel Architectures and Compilation Techniques
Memory expansion technology (MXT): software support and performance

IBM Journal of Research and Development
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
Characterization and exploitation of narrow-width loads: the narrow-width cache approach

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
C-pack: a high-performance microprocessor cache compression algorithm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory

Linearly compressed pages: a main memory compression framework with low complexity and low latency

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Linearly compressed pages: a low-complexity, low-latency main memory compression framework

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Cache compression is a promising technique to increase on-chip cache capacity and to decrease on-chip and off-chip bandwidth usage. Unfortunately, directly applying well-known compression algorithms (usually implemented in software) leads to high hardware complexity and unacceptable decompression/compression latencies, which in turn can negatively affect performance. Hence, there is a need for a simple yet efficient compression technique that can effectively compress common in-cache data patterns, and has minimal effect on cache access latency. In this paper, we introduce a new compression algorithm called Base-Delta-Immediate (BΔI) compression, a practical technique for compressing data in on-chip caches. The key idea is that, for many cache lines, the values within the cache line have a low dynamic range - i.e., the differences between values stored within the cache line are small. As a result, a cache line can be represented using a base value and an array of differences whose combined size is much smaller than the original cache line (we call this the base+delta encoding). Moreover, many cache lines intersperse such base+delta values with small values - our BΔI technique efficiently incorporates such immediate values into its encoding. Compared to prior cache compression approaches, our studies show that BΔI strikes a sweet-spot in the tradeoff between compression ratio, decompression/compression latencies, and hardware complexity. Our results show that BΔI compression improves performance for both single-core (8.1% improvement) and multi-core workloads (9.5% / 11.2% improvement for two/four cores). For many applications, BΔI provides the performance benefit of doubling the cache size of the baseline system, effectively increasing average cache capacity by 1.53X.