Dynamic Partitioning of Shared Cache Memory

Authors:
G. E. Suh;L. Rudolph;S. Devadas
Affiliations:
Massachusetts Institute of Technology suh@mit.edu;Massachusetts Institute of Technology rudolph@mit.edu;Massachusetts Institute of Technology devadas@mit.edu
Venue:
The Journal of Supercomputing
Year:
2004

Citing 18
Cited 89

Improving Disk Cache Hit-Ratios Through Cache Partitioning

IEEE Transactions on Computers
Optimal Partitioning of Cache Memory

IEEE Transactions on Computers
Avoiding conflict misses dynamically in large direct-mapped caches

ASPLOS VI Proceedings of the sixth international conference on Architectural support for programming languages and operating systems
Set-associative cache simulation using generalized binomial trees

ACM Transactions on Computer Systems (TOCS)
Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading

ACM Transactions on Computer Systems (TOCS)
Eliminating cache conflict misses through XOR-based placement functions

ICS '97 Proceedings of the 11th international conference on Supercomputing
Randomized Cache Placement for Eliminating Conflicts

IEEE Transactions on Computers - Special issue on cache memory and related problems
Selective cache ways: on-demand cache resource allocation

Proceedings of the 32nd annual ACM/IEEE international symposium on Microarchitecture
Analytical cache models with applications to cache partitioning

ICS '01 Proceedings of the 15th international conference on Supercomputing
Reducing set-associative cache energy via way-prediction and selective direct-mapping

Proceedings of the 34th annual ACM/IEEE international symposium on Microarchitecture
SPEC CPU2000: Measuring CPU Performance in the New Millennium

Computer
Simultaneous Multithreading: A Platform for Next-Generation Processors

IEEE Micro
Effects of Memory Performance on Parallel Job Scheduling

JSSPP '01 Revised Papers from the 7th International Workshop on Job Scheduling Strategies for Parallel Processing
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
The Minimax Cache: An Energy-Efficient Framework for Media Processors

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Exploiting Choice in Resizable Cache Design to Optimize Deep-Submicron Processor Energy-Delay

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
Extending the reach of microprocessors: column and curious caching

Extending the reach of microprocessors: column and curious caching

CQoS: a framework for enabling QoS in shared caches of CMP platforms

Proceedings of the 18th annual international conference on Supercomputing
Effectively sharing a cache among threads

Proceedings of the sixteenth annual ACM symposium on Parallelism in algorithms and architectures
Dynamic on-chip memory management for chip multiprocessors

Proceedings of the 2004 international conference on Compilers, architecture, and synthesis for embedded systems
Fast and fair: data-stream quality of service

Proceedings of the 2005 international conference on Compilers, architectures and synthesis for embedded systems
A NUCA substrate for flexible CMP cache sharing

Proceedings of the 19th annual international conference on Supercomputing
Customized on-chip memories for embedded chip multiprocessors

Proceedings of the 2005 Asia and South Pacific Design Automation Conference
Static cache partitioning robustness analysis for embedded on-chip multi-processors

Proceedings of the 3rd conference on Computing frontiers
Compositional, efficient caches for a chip multi-processor

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Multi-Level On-Chip Memory Hierarchy Design for Embedded Chip Multiprocessors

ICPADS '06 Proceedings of the 12th International Conference on Parallel and Distributed Systems - Volume 1
Thread-associative memory for multicore and multithreaded computing

Proceedings of the 2006 international symposium on Low power electronics and design
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Molecular Caches: A caching structure for dynamic creation of application-specific Heterogeneous cache regions

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
ASR: Adaptive Selective Replication for CMP Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
CMP cache performance projection: accessibility vs. capacity

ACM SIGARCH Computer Architecture News
Scheduling threads for constructive cache sharing on CMPs

Proceedings of the nineteenth annual ACM symposium on Parallel algorithms and architectures
Virtual hierarchies to support server consolidation

Proceedings of the 34th annual international symposium on Computer architecture
Compiler-managed partitioned data caches for low power

Proceedings of the 2007 ACM SIGPLAN/SIGBED conference on Languages, compilers, and tools for embedded systems
Cooperative cache partitioning for chip multiprocessors

Proceedings of the 21st annual international conference on Supercomputing
A power-aware shared cache mechanism based on locality assessment of memory reference for CMPs

MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
Adaptive set pinning: managing shared caches in chip multiprocessors

Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
A new approach to dynamic self-tuning of database buffers

ACM Transactions on Storage (TOS)
A dynamically reconfigurable cache for multithreaded processors

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Towards hybrid last level caches for chip-multiprocessors

ACM SIGARCH Computer Architecture News
Compositional, dynamic cache management for embedded chip multiprocessors

Proceedings of the conference on Design, automation and test in Europe
PAM: a novel performance/power aware meta-scheduler for multi-core systems

Proceedings of the 2008 ACM/IEEE conference on Supercomputing
On the performance benefits of sharing and privatizing second and third-level cache memories in homogeneous multi-core architectures

Microprocessors & Microsystems
Adaptive insertion policies for managing shared caches

Proceedings of the 17th international conference on Parallel architectures and compilation techniques
A data centered approach for cache partitioning in embedded real-time database system

WSEAS Transactions on Computers
An approach on distributed and shared dynamic cache partition

DNCOCO'08 Proceedings of the 7th conference on Data networks, communications, computers
Static Cache Partitioning Robustness Analysis for Embedded On-Chip Multi-processors

Transactions on High-Performance Embedded Architectures and Compilers I
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations

Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Modeling of cache access behavior based on Zipf's law

Proceedings of the 9th workshop on MEmory performance: DEaling with Applications, systems and architecture
Reducing the harmful effects of last-level cache polluters with an OS-level, software-only pollute buffer

Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
Enhancing operating system support for multicore processors by using hardware performance monitoring

ACM SIGOPS Operating Systems Review
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches

Proceedings of the 36th annual international symposium on Computer architecture
Compositional, Dynamic Cache Management for Embedded Chip Multiprocessors

Journal of Signal Processing Systems
A case for integrated processor-cache partitioning in chip multiprocessors

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Enabling software management for multicore caches with a lightweight hardware support

Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis
Extending the effectiveness of 3D-stacked DRAM caches with an adaptive multi-queue policy

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Optimizing shared cache behavior of chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
SHARP control: controlled shared cache management in chip multiprocessors

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Intra-application shared cache partitioning for multithreaded applications

Proceedings of the 15th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Cache topology aware computation mapping for multicores

PLDI '10 Proceedings of the 2010 ACM SIGPLAN conference on Programming language design and implementation
Area-efficient floorplans and interconnects for homogeneous multi-core architectures

International Journal of High Performance Systems Architecture
Dynamically managed multithreaded reconfigurable architectures for chip multiprocessors

Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Efficient address mapping of shared cache for on-chip many-core architecture

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Thread owned block cache: managing latency in many-core architecture

EuroPar'10 Proceedings of the 16th international Euro-Par conference on Parallel processing: Part I
Enhancing L2 organization for CMPs with a center cell

IPDPS'06 Proceedings of the 20th international conference on Parallel and distributed processing
Online cache modeling for commodity multicore processors

ACM SIGOPS Operating Systems Review
ULCC: a user-level facility for optimizing shared cache performance on multicores

Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
Algorithms for optimally arranging multicore memory structures

EURASIP Journal on Embedded Systems
Management policies analysis for multi-core shared caches

ADMA'10 Proceedings of the 6th international conference on Advanced data mining and applications - Volume Part II
Cost-aware caching schemes in heterogeneous storage systems

The Journal of Supercomputing
A majority-based control scheme for way-adaptable caches

Facing the multicore-challenge
Power-aware dynamic cache partitioning for CMPs

Transactions on high-performance embedded architectures and compilers III
A majority-based control scheme for way-adaptable caches

Facing the multicore-challenge
METE: meeting end-to-end QoS in multicores through system-wide resource management

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
Studying inter-core data reuse in multicores

Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
METE: meeting end-to-end QoS in multicores through system-wide resource management

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Studying inter-core data reuse in multicores

ACM SIGMETRICS Performance Evaluation Review - Performance evaluation review
Evaluating placement policies for managing capacity sharing in CMP architectures with private caches

ACM Transactions on Architecture and Code Optimization (TACO)
A helper thread based dynamic cache partitioning scheme for multithreaded applications

Proceedings of the 48th Design Automation Conference
Joint task assignment and cache partitioning with cache locking for WCET minimization on MPSoC

Journal of Parallel and Distributed Computing
W-Order scan: minimizing cache pollution by application software level cache management for MMDB

WAIM'11 Proceedings of the 12th international conference on Web-age information management
Enhanced adaptive insertion policy for shared caches

APPT'11 Proceedings of the 9th international conference on Advanced parallel processing technologies
A cache-partitioning aware replacement policy for chip multiprocessors

HiPC'06 Proceedings of the 13th international conference on High Performance Computing
The gradient-based cache partitioning algorithm

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Improving shared cache behavior of multithreaded object-oriented applications in multicores

Proceedings of the International Conference on Computer-Aided Design
CRUISE: cache replacement and utility-aware scheduling

ASPLOS XVII Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems
Combining locality analysis with online proactive job co-scheduling in chip multiprocessors

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Scalable shared-cache management by containing thrashing workloads

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
SRP: symbiotic resource partitioning of the memory hierarchy in CMPs

HiPEAC'10 Proceedings of the 5th international conference on High Performance Embedded Architectures and Compilers
Courteous cache sharing: being nice to others in capacity management

Proceedings of the 49th Annual Design Automation Conference
A software memory partition approach for eliminating bank-level interference in multicore systems

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Energy-efficient cache partitioning for future CMPs

Proceedings of the 21st international conference on Parallel architectures and compilation techniques
Survey of scheduling techniques for addressing shared resources in multicore processors

ACM Computing Surveys (CSUR)
Reducing Virtual-to-Physical address translation overhead in Distributed Shared Memory based multi-core Network-on-Chips according to data property

Computers and Electrical Engineering
CPI2: CPU performance isolation for shared compute clusters

Proceedings of the 8th ACM European Conference on Computer Systems
Reuse-based online models for caches

Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems
A survey on cache tuning from a power/energy perspective

ACM Computing Surveys (CSUR)
Enabling fair pricing on HPC systems with node sharing

SC '13 Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis
Adaptive virtual channel partitioning for network-on-chip in heterogeneous architectures

ACM Transactions on Design Automation of Electronic Systems (TODAES) - Special Section on Networks on Chip: Architecture, Tools, and Methodologies
Writeback-aware bandwidth partitioning for multi-core systems with PCM

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Managing shared last-level cache in a heterogeneous multicore processor

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
A cache miss equation for partitioning an NDN content store

Proceedings of the 9th Asian Internet Engineering Conference
Imbalanced cache partitioning for balanced data-parallel programs

Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems

ACM Transactions on Architecture and Code Optimization (TACO)

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes dynamic cache partitioning amongst simultaneously executing processes/threads. We present a general partitioning scheme that can be applied to set-associative caches.Since memory reference characteristics of processes/threads can change over time, our method collects the cache miss characteristics of processes/threads at run-time. Also, the workload is determined at run-time by the operating system scheduler. Our scheme combines the information, and partitions the cache amongst the executing processes/threads. Partition sizes are varied dynamically to reduce the total number of misses.The partitioning scheme has been evaluated using a processor simulator modeling a two-processor CMP system. The results show that the scheme can improve the total IPC significantly over the standard least recently used (LRU) replacement policy. In a certain case, partitioning doubles the total IPC over standard LRU. Our results show that smart cache management and scheduling is essential to achieve high performance with shared cache memory.