Linearly compressed pages: a low-complexity, low-latency main memory compression framework

Authors:
Gennady Pekhimenko;Vivek Seshadri;Yoongu Kim;Hongyi Xin;Onur Mutlu;Phillip B. Gibbons;Michael A. Kozuch;Todd C. Mowry
Affiliations:
Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Carnegie Mellon University;Intel Labs Pittsburgh;Intel Labs Pittsburgh;Carnegie Mellon University
Venue:
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2013

Citing 29
Cited 0

Dynamic base register caching: a technique for reducing address bus width

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
Frequent value compression in data caches

Proceedings of the 33rd annual ACM/IEEE international symposium on Microarchitecture
Frequent value locality and value-centric data cache design

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Simics: A Full System Simulation Platform

Computer
Pinnacle: IBM MXT in a Memory Controller Chip

IEEE Micro
Compressed caching and modern virtual memory simulation

Compressed caching and modern virtual memory simulation
Adaptive Compressed Caching: Design and Implementation

SBAC-PAD '03 Proceedings of the 15th Symposium on Computer Architecture and High Performance Computing
Energy Management for Commercial Servers

Computer
Memory management for high-performance applications

Memory management for high-performance applications
Effective stream-based and execution-based data prefetching

Proceedings of the 18th annual international conference on Supercomputing
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
Frequent value encoding for low power data buses

ACM Transactions on Design Automation of Electronic Systems (TODAES)
Pinpointing Representative Portions of Large Intel® Itanium® Programs with Dynamic Instrumentation

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
A Unified Compressed Memory Hierarchy

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Robust Main-Memory Compression Scheme

Proceedings of the 32nd annual international symposium on Computer Architecture
The case for compressed caching in virtual memory systems

ATEC '99 Proceedings of the annual conference on USENIX Annual Technical Conference
Quantifying the cost of context switch

Proceedings of the 2007 workshop on Experimental computer science
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Memory-Link Compression Schemes: A Value Locality Perspective

IEEE Transactions on Computers
Zero-content augmented caches

Proceedings of the 23rd international conference on Supercomputing
Memory expansion technology (MXT): software support and performance

IBM Journal of Research and Development
McPAT: an integrated power, area, and timing modeling framework for multicore and manycore architectures

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
C-pack: a high-performance microprocessor cache compression algorithm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
A universal algorithm for sequential data compression

IEEE Transactions on Information Theory
Buffer-on-board memory systems

Proceedings of the 39th Annual International Symposium on Computer Architecture
The dynamic granularity memory system

Proceedings of the 39th Annual International Symposium on Computer Architecture
Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Base-delta-immediate compression: practical data compression for on-chip caches

Proceedings of the 21st international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data compression is a promising approach for meeting the increasing memory capacity demands expected in future systems. Unfortunately, existing compression algorithms do not translate well when directly applied to main memory because they require the memory controller to perform non-trivial computation to locate a cache line within a compressed memory page, thereby increasing access latency and degrading system performance. Prior proposals for addressing this performance degradation problem are either costly or energy inefficient. By leveraging the key insight that all cache lines within a page should be compressed to the same size, this paper proposes a new approach to main memory compression--Linearly Compressed Pages (LCP)--that avoids the performance degradation problem without requiring costly or energy-inefficient hardware. We show that any compression algorithm can be adapted to fit the requirements of LCP, and we specifically adapt two previously-proposed compression algorithms to LCP: Frequent Pattern Compression and Base-Delta-Immediate Compression. Evaluations using benchmarks from SPEC CPU2006 and five server benchmarks show that our approach can significantly increase the effective memory capacity (by 69% on average). In addition to the capacity gains, we evaluate the benefit of transferring consecutive compressed cache lines between the memory controller and main memory. Our new mechanism considerably reduces the memory bandwidth requirements of most of the evaluated benchmarks (by 24% on average), and improves overall performance (by 6.1%/13.9%/10.7% for single-/two-/four-core workloads on average) compared to a baseline system that does not employ main memory compression. LCP also decreases energy consumed by the main memory subsystem (by 9.5% on average over the best prior mechanism).