Adaptive Cache Compression for High-Performance Processors

Authors:
Alaa R. Alameldeen;David A. Wood
Affiliations:
University of Wisconsin-Madison;University of Wisconsin-Madison
Venue:
Proceedings of the 31st annual international symposium on Computer architecture
Year:
2004

Citing 30
Cited 47

Machine organization of the IBM RISC System/6000 processor

IBM Journal of Research and Development
Dynamic base register caching: a technique for reducing address bus width

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Decoupled Sectored Caches

IEEE Transactions on Computers
Generating representative Web workloads for network and server performance evaluation

SIGMETRICS '98/PERFORMANCE '98 Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Memory system characterization of commercial workloads

Proceedings of the 25th annual international symposium on Computer architecture
Accurate indirect branch prediction

Proceedings of the 25th annual international symposium on Computer architecture
The YAGS branch prediction scheme

MICRO 31 Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture
Swap compression: resurrecting old ideas

Software—Practice & Experience
A fully associative software-managed cache design

Proceedings of the 27th annual international symposium on Computer architecture
Frequent value compression in data caches

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
An on-chip cache compression technique to reduce decompression overhead and design complexity

Journal of Systems Architecture: the EUROMICRO Journal
Effective algorithms for cache-level compression

GLSVLSI '01 Proceedings of the 11th Great Lakes symposium on VLSI
Frequent value locality and value-centric data cache design

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Full-system timing-first simulation

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
The optimal logic depth per pipeline stage is 6 to 8 FO4 inverter delays

ISCA '02 Proceedings of the 29th annual international symposium on Computer architecture
Frequent value locality and its applications

ACM Transactions on Embedded Computing Systems (TECS)
Automatically characterizing large scale program behavior

Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
The Alpha 21264 Microprocessor

IEEE Micro
Simulating a $2M Commercial Server on a $2K PC

Computer
Parallel compression with cooperative dictionary construction

DCC '96 Proceedings of the Conference on Data Compression
Creating a wider bus using caching techniques

HPCA '95 Proceedings of the 1st IEEE Symposium on High-Performance Computer Architecture
Variability in Architectural Simulations of Multi-Threaded Workloads

HPCA '03 Proceedings of the 9th International Symposium on High-Performance Computer Architecture
Design and Evaluation of a Selective Compressed Memory System

ICCD '99 Proceedings of the 1999 IEEE International Conference on Computer Design
Hardware-Assisted Data Compression for Energy Minimization in Systems with Embedded Processors

Proceedings of the conference on Design, automation and test in Europe
Bandwidth Adaptive Snooping

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
The Effects of Mispredicted-Path Execution on Branch Prediction Structures

PACT '96 Proceedings of the 1996 Conference on Parallel Architectures and Compilation Techniques
The case for compressed caching in virtual memory systems

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
IBM memory expansion technology (MXT)

IBM Journal of Research and Development
POWER4 system microarchitecture

IBM Journal of Research and Development

A compressed memory hierarchy using an indirect index cache

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
An analytical model for software-only main memory compression

WMPI '04 Proceedings of the 3rd workshop on Memory performance issues: in conjunction with the 31st international symposium on computer architecture
A Robust Main-Memory Compression Scheme

Proceedings of the 32nd annual international symposium on Computer Architecture
Maximizing CMP Throughput with Mediocre Cores

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Memory State Compressors for Giga-Scale Checkpoint/Restore

Proceedings of the 14th International Conference on Parallel Architectures and Compilation Techniques
Restrictive Compression Techniques to Increase Level 1 Cache Capacity

ICCD '05 Proceedings of the 2005 International Conference on Computer Design
Fully associative cache partitioning with don't care bits for real-time applications

ACM SIGBED Review - Special issue: IEEE RTAS 2005 work-in-progress
Adaptive main memory compression

ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
Compression in cache design

Proceedings of the 21st annual international conference on Supercomputing
Increasing cache capacity through word filtering

Proceedings of the 21st annual international conference on Supercomputing
Improving disk bandwidth-bound applications through main memory compression

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Impact of message compression on the scalability of an atmospheric modeling application on clusters

Parallel Computing
Scalable packet classification using interpreting: a cross-platform multi-core solution

Proceedings of the 13th ACM SIGPLAN Symposium on Principles and practice of parallel programming
Leakage energy reduction in cache memory by data compression

ACM SIGARCH Computer Architecture News - Special issue: ALPS '07---advanced low power systems
A Unified Compressed Cache Hierarchy Using Simple Frequent Pattern Compression and Partial Cache Line Prefetching

ICESS '07 Proceedings of the 3rd international conference on Embedded Software and Systems
Online Compression Caching

SWAT '08 Proceedings of the 11th Scandinavian workshop on Algorithm Theory
Zero loads: canceling load requests by tracking zero values

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Adaptive data compression for high-performance low-power on-chip networks

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Zero-content augmented caches

Proceedings of the 23rd international conference on Supercomputing
Cancellation of loads that return zero using zero-value caches

Proceedings of the 23rd international conference on Supercomputing
Multi-execution: multicore caching for data-similar executions

Proceedings of the 36th annual international symposium on Computer architecture
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
Online cache state dumping for processor debug

Proceedings of the 46th Annual Design Automation Conference
Using data compression for increasing memory system utilization

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Brief announcement: flashcrowding in tiled multiprocessors under thermal constraints

Proceedings of the twenty-second annual ACM symposium on Parallelism in algorithms and architectures
Efficient lookahead routing and header compression for multicasting in networks-on-chip

Proceedings of the 6th ACM/IEEE Symposium on Architectures for Networking and Communications Systems
Cache aware compression for processor debug support

Proceedings of the Conference on Design, Automation and Test in Europe
Characterization and exploitation of narrow-width loads: the narrow-width cache approach

CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Decoupled zero-compressed memory

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
C-pack: a high-performance microprocessor cache compression algorithm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Memory-, bandwidth-, and power-aware multi-core for a graph database workload

ARCS'11 Proceedings of the 24th international conference on Architecture of computing systems
CATCH: A mechanism for dynamically detecting cache-content-duplication in instruction caches

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic access distance driven cache replacement

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic dictionary-based data compression for level-1 caches

ARCS'06 Proceedings of the 19th international conference on Architecture of Computing Systems
Dynamic co-allocation of level one caches

ICESS'05 Proceedings of the Second international conference on Embedded Software and Systems
HICAMP: architectural support for efficient concurrency-safe shared structured data access

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Residue cache: a low-energy low-area L2 cache architecture via compression and partial hits

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A dual-phase compression mechanism for hybrid DRAM/PCM main memory architectures

Proceedings of the great lakes symposium on VLSI
UniFI: leveraging non-volatile memories for a unified fault tolerance and idle power management technique

Proceedings of the 26th ACM international conference on Supercomputing
ER: elastic RESET for low power and long endurance MLC based phase change memory

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Base-delta-immediate compression: practical data compression for on-chip caches

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Linearly compressed pages: a main memory compression framework with low complexity and low latency

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Decoupled compressed cache: exploiting spatial locality for energy-optimized compressed caching

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Linearly compressed pages: a low-complexity, low-latency main memory compression framework

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Unleashing the potential of MLC STT-RAM caches

Proceedings of the International Conference on Computer-Aided Design

Quantified Score

Hi-index	0.00

Visualization

Abstract

Modern processors use two or more levels ofcache memories to bridge the rising disparity betweenprocessor and memory speeds. Compression canimprove cache performance by increasing effectivecache capacity and eliminating misses. However,decompressing cache lines also increases cache accesslatency, potentially degrading performance.In this paper, we develop an adaptive policy thatdynamically adapts to the costs and benefits of cachecompression. We propose a two-level cache hierarchywhere the L1 cache holds uncompressed data and the L2cache dynamically selects between compressed anduncompressed storage. The L2 cache is 8-way set-associativewith LRU replacement, where each set can storeup to eight compressed lines but has space for only fouruncompressed lines. On each L2 reference, the LRUstack depth and compressed size determine whethercompression (could have) eliminated a miss or incurs anunnecessary decompression overhead. Based on thisoutcome, the adaptive policy updates a single globalsaturating counter, which predicts whether to allocatelines in compressed or uncompressed form.We evaluate adaptive cache compression usingfull-system simulation and a range of benchmarks. Weshow that compression can improve performance formemory-intensive commercial workloads by up to 17%.However, always using compression hurts performancefor low-miss-rate benchmarks-due to unnecessarydecompression overhead-degrading performance byup to 18%. By dynamically monitoring workload behavior,the adaptive policy achieves comparable benefitsfrom compression, while never degrading performanceby more than 0.4%.