Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Authors:
Moinuddin K. Qureshi;Yale N. Patt
Affiliations:
University of Texas at Austin;University of Texas at Austin
Venue:
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Year:
2006

Citing 15
Cited 170

Optimal Partitioning of Cache Memory

IEEE Transactions on Computers
The working set model for program behavior

Communications of the ACM
Using SimPoint for accurate and efficient simulation

SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
A resource allocation model for QoS management

RTSS '97 Proceedings of the 18th IEEE Real-Time Systems Symposium
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Extending the reach of microprocessors: column and curious caching

Extending the reach of microprocessors: column and curious caching
Dynamic Partitioning of Shared Cache Memory

The Journal of Supercomputing
CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Dynamic tracking of page miss ratio curve for memory management

ASPLOS XI Proceedings of the 11th international conference on Architectural support for programming languages and operating systems
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Optimizing Replication, Communication, and Capacity Allocation in CMPs

Proceedings of the 32nd annual international symposium on Computer Architecture
Organizing the Last Line of Defense before Hitting the Memory Wall for CMPs

HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
Cooperative Caching for Chip Multiprocessors

Proceedings of the 33rd annual international symposium on Computer Architecture
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Proceedings of the 15th international conference on Parallel architectures and compilation techniques

Virtual private caches

Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms

Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
Path: page access tracking to improve memory management

Proceedings of the 6th international symposium on Memory management
A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
3D-Stacked Memory Architectures for Multi-core Processors

ISCA '08 Proceedings of the 35th Annual International Symposium on Computer Architecture
Towards hybrid last level caches for chip-multiprocessors

ACM SIGARCH Computer Architecture News
Distributed cooperative caching

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Multi-optimization power management for chip multiprocessors

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Modeling of cache access behavior based on Zipf's law

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Towards practical page coloring-based multicore cache management

Proceedings of the 4th ACM European conference on Computer systems
Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
A study on optimally co-scheduling jobs of different lengths on chip multiprocessors

Proceedings of the 6th ACM conference on Computing frontiers
Enhancing operating system support for multicore processors by using hardware performance monitoring

ACM SIGOPS Operating Systems Review
FlexDCP: a QoS framework for CMP architectures

ACM SIGOPS Operating Systems Review
Dynamic cache clustering for chip multiprocessors

Proceedings of the 23rd international conference on Supercomputing
Rate-based QoS techniques for cache/memory in CMP platforms

Proceedings of the 23rd international conference on Supercomputing
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
Cooperative shared resource access control for low-power chip multiprocessors

Proceedings of the 14th ACM/IEEE international symposium on Low power electronics and design
A case for integrated processor-cache partitioning in chip multiprocessors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Enabling software management for multicore caches with a lightweight hardware support

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
VM3: Measuring, modeling and managing VM shared resources

Computer Networks: The International Journal of Computer and Telecommunications Networking
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Pseudo-LIFO: the foundation of a new family of replacement policies for last-level caches

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing shared cache behavior of chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
SHARP control: controlled shared cache management in chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
MCC-DB: minimizing cache conflicts in multi-core processors for databases

Proceedings of the VLDB Endowment
Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs?

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Cache partitioning for energy-efficient and interference-free embedded multitasking

ACM Transactions on Embedded Computing Systems (TECS)
Probabilistic job symbiosis modeling for SMT processor scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Addressing shared resource contention in multicore processors via scheduling

Proceedings of the fifteenth edition of ASPLOS on Architectural support for programming languages and operating systems
Chameleon: Virtualizing idle acceleration cores of a heterogeneous multicore processor for caching and prefetching

ACM Transactions on Architecture and Code Optimization (TACO)
Q-clouds: managing performance interference effects for QoS-aware clouds

Proceedings of the 5th European conference on Computer systems
PIRATE: QoS and performance management in CMP architectures

ACM SIGMETRICS Performance Evaluation Review
MLP-aware dynamic cache partitioning

HiPEAC'08 Proceedings of the 3rd international conference on High performance embedded architectures and compilers
Augmenting cache partitioning with thread-aware insertion/promotion policies to manage shared caches

Proceedings of the 7th ACM international conference on Computing frontiers
Load balancing using dynamic cache allocation

Proceedings of the 7th ACM international conference on Computing frontiers
Synthesizing contention

Proceedings of the Workshop on Binary Instrumentation and Applications
Cache topology aware computation mapping for multicores

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
ScaleUPC: a UPC compiler for multi-core systems

Proceedings of the Third Conference on Partitioned Global Address Space Programing Models
Aérgia: exploiting packet latency slack in on-chip networks

Proceedings of the 37th annual international symposium on Computer architecture
Morphable memory system: a robust architecture for exploiting multi-level phase change memories

Proceedings of the 37th annual international symposium on Computer architecture
Thread tailor: dynamically weaving threads together for efficient, adaptive parallel applications

Proceedings of the 37th annual international symposium on Computer architecture
Elastic cooperative caching: an autonomous dynamically adaptive memory hierarchy for chip multiprocessors

Proceedings of the 37th annual international symposium on Computer architecture
Off-chip memory bandwidth minimization through cache partitioning for multi-core platforms

Proceedings of the 47th Design Automation Conference
Using dead blocks as a virtual victim cache

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
NUcache: a multicore cache organization based on next-use distance

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Tessellation: space-time partitioning in a manycore client OS

HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Contention-Aware Scheduling on Multicore Systems

ACM Transactions on Computer Systems (TOCS)
Quality of service shared cache management in chip multiprocessor architecture

ACM Transactions on Architecture and Code Optimization (TACO)
Power and performance aware reconfigurable cache for CMPs

Proceedings of the Second International Forum on Next-Generation Multicore/Manycore Technologies
Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses

Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis
Power-efficient spilling techniques for chip multiprocessors

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Efficient address mapping of shared cache for on-chip many-core architecture

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Thread owned block cache: managing latency in many-core architecture

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Online cache modeling for commodity multicore processors

ACM SIGOPS Operating Systems Review
Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
Achieving Non-Inclusive Cache Performance with Inclusive Caches: Temporal Locality Aware (TLA) Cache Management Policies

MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
ULCC: a user-level facility for optimizing shared cache performance on multicores

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Directly characterizing cross core interference through contention synthesis

Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers
Management policies analysis for multi-core shared caches

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Adaptive timekeeping replacement: Fine-grained capacity management for shared CMP caches

ACM Transactions on Architecture and Code Optimization (TACO)
CoQoS: Coordinating QoS-aware shared resources in NoC-based SoCs

Journal of Parallel and Distributed Computing
Design and management of 3D-stacked NUCA cache for chip multiprocessors

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
A majority-based control scheme for way-adaptable caches

Facing the multicore-challenge
Dynamic cache partitioning based on the MLP of cache misses

Transactions on high-performance embedded architectures and compilers III
Power-aware dynamic cache partitioning for CMPs

Transactions on high-performance embedded architectures and compilers III
A majority-based control scheme for way-adaptable caches

Facing the multicore-challenge
Memory management in NUMA multicore systems: trapped between cache contention and interconnect overhead

Proceedings of the international symposium on Memory management
Modeling program resource demand using inherent program characteristics

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Studying inter-core data reuse in multicores

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Predictive coordination of multiple on-chip resources for chip multiprocessors

Proceedings of the international conference on Supercomputing
Vantage: scalable and efficient fine-grain cache partitioning

Proceedings of the 38th annual international symposium on Computer architecture
Prefetch-aware shared resource management for multi-core systems

Proceedings of the 38th annual international symposium on Computer architecture
The impact of memory subsystem resource sharing on datacenter applications

Proceedings of the 38th annual international symposium on Computer architecture
Contentiousness vs. sensitivity: improving contention aware runtime systems on multicore architectures

Proceedings of the 1st International Workshop on Adaptive Self-Tuning Computing Systems for the Exaflop Era
Modeling program resource demand using inherent program characteristics

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Studying inter-core data reuse in multicores

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
FACT: a framework for adaptive contention-aware thread migrations

Proceedings of the 8th ACM International Conference on Computing Frontiers
Evaluating placement policies for managing capacity sharing in CMP architectures with private caches

ACM Transactions on Architecture and Code Optimization (TACO)
Dynamic cache reconfiguration and partitioning for energy optimization in real-time multi-core systems

Proceedings of the 48th Design Automation Conference
A helper thread based dynamic cache partitioning scheme for multithreaded applications

Proceedings of the 48th Design Automation Conference
Multilayer cache partitioning for multiprogram workloads

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
W-Order scan: minimizing cache pollution by application software level cache management for MMDB

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Cuanta: quantifying effects of shared on-chip resource interference for consolidated virtual machines

Proceedings of the 2nd ACM Symposium on Cloud Computing
CacheVisor: a toolset for visualizing shared caches in multicore and multithreaded processors

PaCT'11 Proceedings of the 11th international conference on Parallel computing technologies
Enhanced adaptive insertion policy for shared caches

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
Ally: OS-Transparent Packet Inspection Using Sequestered Cores

Proceedings of the 2011 ACM/IEEE Seventh Symposium on Architectures for Networking and Communications Systems
Non-monopolizable caches: Low-complexity mitigation of cache side channel attacks

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
The gradient-based cache partitioning algorithm

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Improving shared cache behavior of multithreaded object-oriented applications in multicores

Proceedings of the International Conference on Computer-Aided Design
CRUISE: cache replacement and utility-aware scheduling

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Region scheduling: efficiently using the cache architectures via page-level affinity

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Bubble-Up: increasing utilization in modern warehouse scale computers via sensible co-locations

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
PACMan: prefetch-aware cache management for high performance caching

Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
A high performance adaptive miss handling architecture for chip multiprocessors

Transactions on High-Performance Embedded Architectures and Compilers IV
Scalable shared-cache management by containing thrashing workloads

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
DIEF: an accurate interference feedback mechanism for chip multiprocessor memory systems

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Towards workload-aware virtual machine consolidation on cloud platforms

Proceedings of the 6th International Conference on Ubiquitous Information Management and Communication
VMMB: Virtual Machine Memory Balancing for Unmodified Operating Systems

Journal of Grid Computing
Probabilistic modeling for job symbiosis scheduling on SMT processors

ACM Transactions on Architecture and Code Optimization (TACO)
Courteous cache sharing: being nice to others in capacity management

Proceedings of the 49th Annual Design Automation Conference
Compiling for niceness: mitigating contention for QoS in warehouse scale computers

Proceedings of the Tenth International Symposium on Code Generation and Optimization
Unified memory optimizing architecture: memory subsystem control with a unified predictor

Proceedings of the 26th ACM international conference on Supercomputing
Locality & utility co-optimization for practical capacity management of shared last level caches

Proceedings of the 26th ACM international conference on Supercomputing
BiN: a buffer-in-NUCA scheme for accelerator-rich CMPs

Proceedings of the 2012 ACM/IEEE international symposium on Low power electronics and design
Probabilistic shared cache management (PriSM)

Proceedings of the 39th Annual International Symposium on Computer Architecture
Dynamic virtual machine scheduling in clouds for architectural shared resources

HotCloud'12 Proceedings of the 4th USENIX conference on Hot Topics in Cloud Ccomputing
Dynamic QoS management for chip multiprocessors

ACM Transactions on Architecture and Code Optimization (TACO)
PEPON: performance-aware hierarchical power budgeting for NoC based multicores

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Scalability-based manycore partitioning

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Making data prefetch smarter: adaptive prefetching on POWER7

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Efficient techniques for predicting cache sharing and throughput

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
The evicted-address filter: a unified mechanism to address both cache pollution and thrashing

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
A software memory partition approach for eliminating bank-level interference in multicore systems

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Energy-efficient cache partitioning for future CMPs

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Survey of scheduling techniques for addressing shared resources in multicore processors

ACM Computing Surveys (CSUR)
When less is more (LIMO):controlled parallelism forimproved efficiency

Proceedings of the 2012 international conference on Compilers, architectures and synthesis for embedded systems
PRETI: partitioned real-time shared cache for mixed-criticality real-time systems

Proceedings of the 20th International Conference on Real-Time and Network Systems
Per-thread cycle accounting in multicore processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Fair CPU time accounting in CMP+SMT processors

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Stream arbitration: Towards efficient bandwidth utilization for emerging on-chip interconnects

ACM Transactions on Architecture and Code Optimization (TACO) - Special Issue on High-Performance Embedded Architectures and Compilers
Power-aware resource allocation for CPU-and memory-intense internet services

E2DC'12 Proceedings of the First international conference on Energy Efficient Data Centers
A survey and taxonomy of on-chip monitoring of multicore systems-on-chip

ACM Transactions on Design Automation of Electronic Systems (TODAES)
ReQoS: reactive static/dynamic compilation for QoS in warehouse scale computers

Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems
Leveraging phase change memory to achieve efficient virtual machine execution

Proceedings of the 9th ACM SIGPLAN/SIGOPS international conference on Virtual execution environments
Cache-Conscious Wavefront Scheduling

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Improving Cache Management Policies Using Dynamic Reuse Distances

MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Reuse-based online models for caches

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
Elastic paging

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)
Design and implementation of caching services in the cloud

IBM Journal of Research and Development
High-endurance hybrid cache design in CMP architecture with cache partitioning and access-aware policy

Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Adaptive cache management for a combined SRAM and DRAM cache hierarchy for multi-cores

Proceedings of the Conference on Design, Automation and Test in Europe
Reducing writes in phase-change memory environments by using efficient cache replacement policies

Proceedings of the Conference on Design, Automation and Test in Europe
OAP: an obstruction-aware cache management policy for STT-RAM last-level caches

Proceedings of the Conference on Design, Automation and Test in Europe
A hardware evaluation of cache partitioning to improve utilization and energy-efficiency while preserving responsiveness

Proceedings of the 40th Annual International Symposium on Computer Architecture
The locality-aware adaptive cache coherence protocol

Proceedings of the 40th Annual International Symposium on Computer Architecture
Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers

Proceedings of the 40th Annual International Symposium on Computer Architecture
PCASA: probabilistic control-adjusted selective allocation for shared caches

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Dynamic cache management in multi-core architectures through run-time adaptation

DATE '12 Proceedings of the Conference on Design, Automation and Test in Europe
Location-aware cache management for many-core processors with deep cache hierarchy

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Enabling fair pricing on HPC systems with node sharing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
S-CAVE: effective SSD caching to improve virtual machine storage performance

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Writeback-aware bandwidth partitioning for multi-core systems with PCM

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
An empirical model for predicting cross-core performance interference on multicore processors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Jigsaw: scalable software-defined caches

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Managing shared last-level cache in a heterogeneous multicore processor

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Impact of resource sharing on performance and performance prediction: a survey

CONCUR'13 Proceedings of the 24th international conference on Concurrency Theory
Design space exploration of on-chip ring interconnection for a CPU-GPU heterogeneous architecture

Journal of Parallel and Distributed Computing
Imbalanced cache partitioning for balanced data-parallel programs

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Efficient management of last-level caches in graphics processors for 3D scene rendering workloads

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
Virtually split cache: An efficient mechanism to distribute instructions and data

ACM Transactions on Architecture and Code Optimization (TACO)
Ubik: efficient cache sharing with strict qos for latency-critical workloads

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
The sharing architecture: sub-core configurability for IaaS clouds

Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
Temporal-based multilevel correlating inclusive cache replacement

ACM Transactions on Architecture and Code Optimization (TACO)
ReSense: Mapping dynamic workloads of colocated multithreaded applications using resource sensitivity

ACM Transactions on Architecture and Code Optimization (TACO)
WADE: Writeback-aware dynamic cache management for NVM-based main memory system

ACM Transactions on Architecture and Code Optimization (TACO)
Reducing inter-core cache contention with an adaptive bank mapping policy in DRAM cache

Proceedings of the Ninth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis
On modeling contention for shared caches in multi-core processors with techniques from ecology

Natural Computing: an international journal
An effectiveness-based adaptive cache replacement policy

Microprocessors & Microsystems
Supporting faulty banks in NUCA by NoC assisted remapping mechanisms

The Journal of Supercomputing
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

ACM Transactions on Architecture and Code Optimization (TACO)
Virtual machine consolidation based on interference modeling

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper investigates the problem of partitioning a shared cache between multiple concurrently executing applications. The commonly used LRU policy implicitly partitions a shared cache on a demand basis, giving more cache resources to the application that has a high demand and fewer cache resources to the application that has a low demand. However, a higher demand for cache resources does not always correlate with a higher performance from additional cache resources. It is beneficial for performance to invest cache resources in the application that benefits more from the cache resources rather than in the application that has more demand for the cache resources. This paper proposes utility-based cache partitioning (UCP), a low-overhead, runtime mechanism that partitions a shared cache between multiple applications depending on the reduction in cache misses that each application is likely to obtain for a given amount of cache resources. The proposed mechanism monitors each application at runtime using a novel, cost-effective, hardware circuit that requires less than 2kB of storage. The information collected by the monitoring circuits is used by a partitioning algorithm to decide the amount of cache resources allocated to each application. Our evaluation, with 20 multiprogrammed workloads, shows that UCP improves performance of a dual-core system by up to 23% and on average 11% over LRU-based cache partitioning.