Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads

Authors:
Vijay Sathish;Michael J. Schulte;Nam Sung Kim
Affiliations:
The University of Wisconsin-Madison, Madison, WI, USA;Advanced Micro Devices, Austin, TX, USA;The University of Wisconsin-Madison, Madison, WI, USA
Venue:
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Year:
2012

Citing 12
Cited 2

Reducing power by optimizing the necessary precision/range of floating-point arithmetic

IEEE Transactions on Very Large Scale Integration (VLSI) Systems - Special section on low-power electronics and design
Frequent value locality and value-centric data cache design

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Adaptive Cache Compression for High-Performance Processors

Proceedings of the 31st annual international symposium on Computer architecture
The GeForce 6800

IEEE Micro
Memory-Link Compression Schemes: A Value Locality Perspective

IEEE Transactions on Computers
Fool me twice: Exploring and exploiting error tolerance in physics-based animation

ACM Transactions on Graphics (TOG)
Memory expansion technology (MXT): software support and performance

IBM Journal of Research and Development
Rodinia: A benchmark suite for heterogeneous computing

IISWC '09 Proceedings of the 2009 IEEE International Symposium on Workload Characterization (IISWC)
ERCBench: An Open-Source Benchmark Suite for Embedded and Reconfigurable Computing

FPL '10 Proceedings of the 2010 International Conference on Field Programmable Logic and Applications
C-pack: a high-performance microprocessor cache compression algorithm

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Fermi GF100 GPU Architecture

IEEE Micro
Analyzing throughput of GPGPUs exploiting within-die core-to-core frequency variation

ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software

GPUWattch: enabling energy optimizations in GPGPUs

Proceedings of the 40th Annual International Symposium on Computer Architecture
Linearly compressed pages: a low-complexity, low-latency main memory compression framework

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

State-of-the-art graphic processing units (GPUs) provide very high memory bandwidth, but the performance of many general-purpose GPU (GPGPU) workloads is still bounded by memory bandwidth. Although compression techniques have been adopted by commercial GPUs, they are only used for compressing texture and color data, not data for GPGPU workloads. Furthermore, the microarchitectural details of GPU compression are proprietary and its performance benefits have not been previously published. In this paper, we first investigate required microarchitectural changes to support lossless compression techniques for data transferred between the GPU and its off-chip memory to provide higher effective bandwidth. Second, by exploiting some characteristics of floating-point numbers in many GPGPU workloads, we propose to apply lossless compression to floating-point numbers after truncating their least-significant bits (i.e., lossy compression). This can reduce the bandwidth usage even further with very little impact on overall computational accuracy. Finally, we demonstrate that a GPU with our lossless and lossy compression techniques can improve the performance of memory-bound GPGPU workloads by 26% and 41% on average.