Proceedings of the 21st annual international conference on Supercomputing
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A compiler-directed data prefetching scheme for chip multiprocessors
Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming
Using data compression for increasing memory system utilization
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Adaptive prefetching for shared cache based chip multiprocessors
Proceedings of the Conference on Design, Automation and Test in Europe
C-pack: a high-performance microprocessor cache compression algorithm
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
PACMan: prefetch-aware cache management for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Hi-index | 0.00 |
In chip multiprocessors (CMPs), multiple cores compete for shared resources such as on-chip caches and off-chip pin bandwidth. Stride-based hardware prefetching increases demand for these resources, causing contention that can degrade performance (up to 35% for one of our benchmarks). In this paper, we first show that cache and link (off-chip interconnect) compression can increase the effective cache capacity (thereby reducing off-chip misses) and increase the effective off-chip bandwidth (reducing contention). On an 8-processor CMP with no prefetching, compression improves performance by up to 18% for commercial workloads. Second, we propose a simple adaptive prefetching mechanism that uses cache compression's extra tags to detect useless and harmful prefetches. Furthermore, in the central result of this paper, we show that compression and prefetching interact in a strong positive way, resulting in combined performance improvement of 10-51% for seven of our eight workloads.