Using compression to improve chip multiprocessor performance

Authors:
David A. Wood;Alaa R. Alameldeen
Affiliations:
The University of Wisconsin - Madison;The University of Wisconsin - Madison
Venue:
Using compression to improve chip multiprocessor performance
Year:
2006

Citing 0
Cited 7

Impact of die-to-die and within-die parameter variations on the throughput distribution of multi-core processors

ISLPED '07 Proceedings of the 2007 international symposium on Low power electronics and design
Impact of message compression on the scalability of an atmospheric modeling application on clusters

Parallel Computing
Scaling the bandwidth wall: challenges in and avenues for CMP scaling

Proceedings of the 36th annual international symposium on Computer architecture
Impact of die-to-die and within-die parameter variations on the clock frequency and throughput of multi-core processors

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Debunking the 100X GPU vs. CPU myth: an evaluation of throughput computing on CPU and GPU

Proceedings of the 37th annual international symposium on Computer architecture
C-pack: a high-performance microprocessor cache compression algorithm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Amoeba-Cache: Adaptive Blocks for Eliminating Waste in the Memory Hierarchy

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip multiprocessors (CMPs) combine multiple processors on a single die, typically with private level-one caches and a shared level-two cache. The increasing number of processors cores in a CMP increases the demand on two critical resources: the shared L2 cache capacity and the off-chip pin bandwidth. Such demand is further exacerbated by latency-hiding techniques such as hardware prefetching. In this dissertation, we explore using compression to effectively increase cache and pin bandwidth resources and ultimately CMP performance. We identify two distinct and complementary designs where compression can help improve CMP performance: Cache Compression and Link Compression. Cache compression stores compressed lines in the cache, potentially increasing the effective cache size, reducing off-chip misses and improving performance. Unfortunately, decompression overhead can slow down cache hit latencies, possibly degrading performance. Link (i.e., off-chip interconnect) compression compresses communication messages before sending to or receiving from off-chip system components, thereby increasing the effective pin bandwidth and improving performance for bandwidth-limited configurations. While compression can have a positive impact on CMP performance, practical implementations of compression raise a few concerns. In this dissertation, we make five contributions that address these concerns. We propose a compressed L2 cache design based on a simple compression algorithm with a low decompression overhead. We develop an adaptive compression scheme that dynamically adapts to the costs and benefits of cache compression, and employs compression only when it helps performance. We show that cache and link compression both combine to improve CMP performance for commercial and (some) scientific workloads. We show that compression interacts in a strong positive way with hardware prefetching, whereby a system that implements both compression and hardware prefetching can have a higher speedup than the product of speedups of each scheme alone. We provide a simple analytical model that helps provide qualitative intuition into the trade-off between cores, caches, communication and compression, and use full-system simulation to quantify this trade-off for a set of commercial workloads.