Adaptive insertion policies for managing shared caches

Authors:
Aamer Jaleel;William Hasenplaugh;Moinuddin Qureshi;Julien Sebot;Simon Steely, Jr.;Joel Emer
Affiliations:
Intel Corporation, Hudson, MA, USA;Intel Corporation, Hudson, MA, USA;IBM T. J. Watson Research Center, Yorktown Heights, NY, USA;Intel Israel Design Center, Haifa, Israel;Intel Corporation, Hudson, MA, USA;Intel Corporation, Hudson, MA, USA
Venue:
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Year:
2008

Citing 15
Cited 60

Optimal Partitioning of Cache Memory

IEEE Transactions on Computers
Symbiotic jobscheduling for a simultaneous multithreaded processor

ASPLOS IX Proceedings of the ninth international conference on Architectural support for programming languages and operating systems
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
Performance evaluation of cache replacement policies for the SPEC CPU2000 benchmark suite

ACM-SE 42 Proceedings of the 42nd annual Southeast regional conference
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Niagara: A 32-Way Multithreaded Sparc Processor

IEEE Micro
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
Adaptive insertion policies for high performance caching

Proceedings of the 34th annual international symposium on Computer architecture
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
IBM Power5 Chip: A Dual-Core Multithreaded Processor

IEEE Micro

FlexDCP: a QoS framework for CMP architectures

ACM SIGOPS Operating Systems Review
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
Thread criticality predictors for dynamic performance, power, and resource management in chip multiprocessors

Proceedings of the 36th annual international symposium on Computer architecture
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
SHARP control: controlled shared cache management in chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Intra-application shared cache partitioning for multithreaded applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Augmenting cache partitioning with thread-aware insertion/promotion policies to manage shared caches

Proceedings of the 7th ACM international conference on Computing frontiers
Load balancing using dynamic cache allocation

Proceedings of the 7th ACM international conference on Computing frontiers
High performance cache replacement using re-reference interval prediction (RRIP)

Proceedings of the 37th annual international symposium on Computer architecture
Power and performance aware reconfigurable cache for CMPs

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Sampling Dead Block Prediction for Last-Level Caches

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The ZCache: Decoupling Ways and Associativity

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Management policies analysis for multi-core shared caches

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches

ACM Transactions on Architecture and Code Optimization (TACO)
Studying inter-core data reuse in multicores

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Vantage: scalable and efficient fine-grain cache partitioning

Proceedings of the 38th annual international symposium on Computer architecture
Bypass and insertion algorithms for exclusive last-level caches

Proceedings of the 38th annual international symposium on Computer architecture
Studying inter-core data reuse in multicores

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Dynamic access distance driven cache replacement

ACM Transactions on Architecture and Code Optimization (TACO)
Evaluating placement policies for managing capacity sharing in CMP architectures with private caches

ACM Transactions on Architecture and Code Optimization (TACO)
Enhanced adaptive insertion policy for shared caches

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation

Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
The gradient-based cache partitioning algorithm

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Paging for multi-core shared caches

Proceedings of the 3rd Innovations in Theoretical Computer Science Conference
CRUISE: cache replacement and utility-aware scheduling

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
SHiP: signature-based hit predictor for high performance caching

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
PACMan: prefetch-aware cache management for high performance caching

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Scalable shared-cache management by containing thrashing workloads

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
A capacity-efficient insertion policy for dynamic cache resizing mechanisms

Proceedings of the 9th conference on Computing Frontiers
Courteous cache sharing: being nice to others in capacity management

Proceedings of the 49th Annual Design Automation Conference
Reducing last level cache pollution through OS-level software-controlled region-based partitioning

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Unified memory optimizing architecture: memory subsystem control with a unified predictor

Proceedings of the 26th ACM international conference on Supercomputing
Locality & utility co-optimization for practical capacity management of shared last level caches

Proceedings of the 26th ACM international conference on Supercomputing
FLEXclusion: balancing cache capacity and on-chip bandwidth via flexible exclusion

Proceedings of the 39th Annual International Symposium on Computer Architecture
Probabilistic shared cache management (PriSM)

Proceedings of the 39th Annual International Symposium on Computer Architecture
Introducing hierarchy-awareness in replacement and bypass algorithms for last-level caches

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Efficient techniques for predicting cache sharing and throughput

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Survey of scheduling techniques for addressing shared resources in multicore processors

ACM Computing Surveys (CSUR)
Per-thread cycle accounting in multicore processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Exploiting reuse locality on inclusive shared last-level caches

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
An application-aware cache replacement policy for last-level caches

ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
Cache-Conscious Wavefront Scheduling

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Improving Cache Management Policies Using Dynamic Reuse Distances

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)
Adaptive cache management for a combined SRAM and DRAM cache hierarchy for multi-cores

Proceedings of the Conference on Design, Automation and Test in Europe
PCASA: probabilistic control-adjusted selective allocation for shared caches

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Dynamic cache management in multi-core architectures through run-time adaptation

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Imbalanced cache partitioning for balanced data-parallel programs

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
The reuse cache: downsizing the shared last-level cache

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient management of last-level caches in graphics processors for 3D scene rendering workloads

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Call Paths for Pin Tools

Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization
WADE: Writeback-aware dynamic cache management for NVM-based main memory system

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing inter-core cache contention with an adaptive bank mapping policy in DRAM cache

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Chip Multiprocessors (CMPs) allow different applications to concurrently execute on a single chip. When applications with differing demands for memory compete for a shared cache, the conventional LRU replacement policy can significantly degrade cache performance when the aggregate working set size is greater than the shared cache. In such cases, shared cache performance can be significantly improved by preserving the entire working set of applications that can co-exist in the cache and preserving some portion of the working set of the remaining applications. This paper investigates the use of adaptive insertion policies to manage shared caches. We show that directly extending the recently proposed dynamic insertion policy (DIP) is inadequate for shared caches since DIP is unaware of the characteristics of individual applications. We propose Thread-Aware Dynamic Insertion Policy (TADIP) that can take into account the memory requirements of each of the concurrently executing applications. Our evaluation with multi-programmed workloads for 2-core, 4-core, 8-core, and 16-core CMPs show that a TADIP-managed shared cache improves overall throughput by as much as 94%, 64%, 26%, and 16% respectively (on average 14%, 18%, 15%, and 17%) over the baseline LRU policy. The performance benefit of TADIP is 2.6x compared to DIP and 1.3x compared to the recently proposed Utility-based Cache Partitioning (UCP) scheme. We also show that a TADIP-managed shared cache provides performance benefits similar to doubling the size of an LRU-managed cache. Furthermore, TADIP requires a total storage overhead of less than two bytes per core, does not require changes to the existing cache structure, and performs similar to LRU for LRU friendly workloads.