Enhanced adaptive insertion policy for shared caches

Authors:
Chongmin Li;Dongsheng Wang;Yibo Xue;Haixia Wang;Xi Zhang
Affiliations:
Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science & Technology, Tsinghua University, Beijing, China;Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science & Technology, Tsinghua University, Beijing, China;Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science & Technology, Tsinghua University, Beijing, China;Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science & Technology, Tsinghua University, Beijing, China;Tsinghua National Laboratory for Information Science and Technology, Department of Computer Science & Technology, Tsinghua University, Beijing, China
Venue:
APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Year:
2011

Citing 13
Cited 0

Optimal Partitioning of Cache Memory

IEEE Transactions on Computers
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
LRFU: A Spectrum of Policies that Subsumes the Least Recently Used and Least Frequently Used Policies

IEEE Transactions on Computers
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
A Cache Replacement Policy Using Adaptive Insertion and Re-reference Prediction

SBAC-PAD '10 Proceedings of the 2010 22nd International Symposium on Computer Architecture and High Performance Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

The LRU replacement policy is commonly used in the lastlevel caches of multiprocessors. However, LRU policy does not work well for memory intensive workloads which working set are greater than the available cache size. When a new arrival cache block is inserted at the MRU position, it may never be reused until being evicted from the cache but occupy the cache space for a long time during its movement from the MRU to the LRU position. This results in inefficient use of cache space. If we insert a new cache block at the LRU position directly, the cache performance can be improved by keeping some fraction of the working sets is retained in the caches. In this work, we propose Enhanced Dynamic Insertion Policy (EDIP) and Thread Aware Enhanced Dynamic Insertion Policy (TAEDIP) which can adjust the probability of insertion at MRU by set dueling. The runtime information of the previous and the next BIP level are gathered and compared with current level to choose an appropriate BIP level. At the same time, access frequency is used to choose a victim. In this way, our work can get less miss rate than LRU for workloads with large work set. For workloads with small working set, the miss rate of our design is close to LRU replacement policy. Simulation results in single core configuration with 1MB 16-way LLC show that EDIP reduces CPI over LRU and DIP by an average of 11.4% and 1.8% respectively. On quad-core configuration with 4MB 16-way LLC. TAEDIP improves the performance on the weighted speedup metric by 11.2% over LRU and 3.7% over TADIP on average. For fairness metric, TAEDIP improves the performance by 11.2% over LRU and 2.6% over TADIP on average.