Dynamic cache partitioning based on the MLP of cache misses

Authors:
Miquel Moreto;Francisco J. Cazorla;Alex Ramirez;Mateo Valero
Affiliations:
Universitat Politècnica de Catalunya, DAC, Barcelona, Spain and HiPEAC European Network of Excellence;Barcelona Supercomputing Center, Centro Nacional de Supercomputación, Spain;Universitat Politècnica de Catalunya, DAC, Barcelona, Spain and HiPEAC European Network of Excellence and Barcelona Supercomputing Center, Centro Nacional de Supercomputación, Spain;Universitat Politècnica de Catalunya, DAC, Barcelona, Spain and HiPEAC European Network of Excellence and Barcelona Supercomputing Center, Centro Nacional de Supercomputación, Spain
Venue:
Transactions on high-performance embedded architectures and compilers III
Year:
2011

Citing 18
Cited 3

Simultaneous multithreading: maximizing on-chip parallelism

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
A Single-Chip Multiprocessor

Computer
Lockup-free instruction fetch/prefetch cache organization

ISCA '81 Proceedings of the 8th annual symposium on Computer Architecture
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning

HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
A First-Order Superscalar Processor Model

Proceedings of the 31st annual international symposium on Computer architecture
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture

Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Dynamically Controlled Resource Allocation in SMT Processors

Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture
Predicting Inter-Thread Cache Contention on a Chip Multi-Processor Architecture

HPCA '05 Proceedings of the 11th International Symposium on High-Performance Computer Architecture
A Case for MLP-Aware Cache Replacement

Proceedings of the 33rd annual international symposium on Computer Architecture
POWER5 System microarchitecture

IBM Journal of Research and Development - POWER5 and packaging
Architectural support for operating system-driven CMP cache management

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource

Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches

Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Explaining Dynamic Cache Partitioning Speed Ups

IEEE Computer Architecture Letters
FAME: FAirly MEasuring Multithreaded Architectures

PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
A dynamically reconfigurable cache for multithreaded processors

Journal of Embedded Computing - Issues in embedded single-chip multicore architectures
Discovering and Exploiting Program Phases

IEEE Micro
Evaluation techniques for storage hierarchies

IBM Systems Journal

Writeback-aware partitioning and replacement for last-level caches in phase change main memory systems

ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
Writeback-aware bandwidth partitioning for multi-core systems with PCM

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
L1-bandwidth aware thread allocation in multicore SMT processors

PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index	0.00

Visualization

Abstract

Dynamic partitioning of shared caches has been proposed to improve performance of traditional eviction policies in modern multithreaded architectures. All existing Dynamic Cache Partitioning (DCP) algorithms work on the number of misses caused by each thread and treat all misses equally. However, it has been shown that cache misses cause different impact in performance depending on their distribution. Clustered misses share their miss penalty as they can be served in parallel, while isolated misses have a greater impact on performance as the memory latency is not shared with other misses. We take this fact into account and propose a new DCP algorithm that considers misses differently depending on their influence in performance. Our proposal obtains improvements over traditional eviction policies up to 63.9% (10.6% on average) and it also outperforms previous DCP proposals by up to 15.4% (4.1% on average) in a four-core architecture. Our proposal reaches the same performance as a 50% larger shared cache. Finally, we present a practical implementation of our proposal that requires less than 8KB of storage.