A Flexible Code Compression Scheme Using Partitioned Look-Up Tables
HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
Adaptive data compression for high-performance low-power on-chip networks
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Scaling the bandwidth wall: challenges in and avenues for CMP scaling
Proceedings of the 36th annual international symposium on Computer architecture
Characterization and exploitation of narrow-width loads: the narrow-width cache approach
CASES '10 Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems
Moguls: a model to explore the memory hierarchy for bandwidth improvements
Proceedings of the 38th annual international symposium on Computer architecture
Lossless and lossy memory I/O link compression for improving performance of GPGPU workloads
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Linearly compressed pages: a low-complexity, low-latency main memory compression framework
Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 14.98 |
As the speed of processors increases, the on-chip memory hierarchy will continue to be crucial for the performance. Unfortunately, simply increasing the size of the on-chip caches yields diminishing returns and memory-bound applications may suffer from the limited off-chip bandwidth. This paper focuses on memory-link compression schemes. A first contribution is a framework for identifying the nature of the value locality exploited by published schemes. This framework is then used to quantitatively establish what type of value locality is exploited by each compression scheme. We find that as many as 40% of the values transferred in integer, media, and commercial applications are small integers and can be coded using less than 8 bits. By leveraging small-value locality, 35% of the bandwidth can be freed up. Another significant chunk of the values either forms clusters in the value space or belongs to a fairly small group of frequent isolated values. By leveraging this category, one can free up 70% of the bandwidth. We finally contribute with a new compression scheme that exploits multiple value-locality categories and is shown to free up 75% of the bandwidth.