On the inclusion properties for multi-level cache hierarchies
ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
Tradeoffs in two-level on-chip caching
ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
A modified approach to data cache management
Proceedings of the 28th annual international symposium on Microarchitecture
Run-time adaptive cache hierarchy management via reference analysis
Proceedings of the 24th annual international symposium on Computer architecture
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Orion: a power-performance simulator for interconnection networks
Proceedings of the 35th annual ACM/IEEE international symposium on Microarchitecture
Power-driven Design of Router Microarchitectures in On-chip Networks
Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
Interconnect-Aware Coherence Protocols for Chip Multiprocessors
Proceedings of the 33rd annual international symposium on Computer Architecture
Performance evaluation of exclusive cache hierarchies
ISPASS '04 Proceedings of the 2004 IEEE International Symposium on Performance Analysis of Systems and Software
Adaptive insertion policies for high performance caching
Proceedings of the 34th annual international symposium on Computer architecture
Counter-Based Cache Replacement and Bypassing Algorithms
IEEE Transactions on Computers
Adaptive insertion policies for managing shared caches
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
High performance cache replacement using re-reference interval prediction (RRIP)
Proceedings of the 37th annual international symposium on Computer architecture
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Reducing Network-on-Chip energy consumption through spatial locality speculation
NOCS '11 Proceedings of the Fifth ACM/IEEE International Symposium on Networks-on-Chip
Bypass and insertion algorithms for exclusive last-level caches
Proceedings of the 38th annual international symposium on Computer architecture
A case for heterogeneous on-chip interconnects for CMPs
Proceedings of the 38th annual international symposium on Computer architecture
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches
Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Hi-index | 0.00 |
Exclusive last-level caches (LLCs) reduce memory accesses by effectively utilizing cache capacity. However, they require excessive on-chip bandwidth to support frequent insertions of cache lines on eviction from upper-level caches. Non-inclusive caches, on the other hand, have the advantage of using the on-chip bandwidth more effectively but suffer from a higher miss rate. Traditionally, the decision to use the cache as exclusive or non-inclusive is made at design time. However, the best option for a cache organization depends on application characteristics, such as working set size and the amount of traffic consumed by LLC insertions. This paper proposes FLEXclusion, a design that dynamically selects between exclusion and non-inclusion depending on workload behavior. With FLEXclusion, the cache behaves like an exclusive cache when the application benefits from extra cache capacity, and it acts as a non-inclusive cache when additional cache capacity is not useful, so that it can reduce on-chip bandwidth. FLEXclusion leverages the observation that both non-inclusion and exclusion rely on similar hardware support, so our proposal can be implemented with negligible hardware changes. Our evaluations show that a FLEXclusive cache reduces the on-chip LLC insertion traffic by 72.6% compared to an exclusive design and improves performance by 5.9% compared to a non-inclusive design.