The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Cache decay: exploiting generational behavior to reduce cache leakage power
ISCA '01 Proceedings of the 28th annual international symposium on Computer architecture
Let caches decay: reducing leakage energy via exploitation of cache generational behavior
ACM Transactions on Computer Systems (TOCS)
A New Memory Monitoring Scheme for Memory-Aware Scheduling and Partitioning
HPCA '02 Proceedings of the 8th International Symposium on High-Performance Computer Architecture
CQoS: a framework for enabling QoS in shared caches of CMP platforms
Proceedings of the 18th annual international conference on Supercomputing
Fair Cache Sharing and Partitioning in a Chip Multiprocessor Architecture
Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Architectural support for operating system-driven CMP cache management
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Communist, utilitarian, and capitalist cache policies on CMPs: caches as a shared resource
Proceedings of the 15th international conference on Parallel architectures and compilation techniques
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 39th Annual IEEE/ACM International Symposium on Microarchitecture
Proceedings of the 34th annual international symposium on Computer architecture
QoS policies and architecture for cache/memory in CMP platforms
Proceedings of the 2007 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Cooperative cache partitioning for chip multiprocessors
Proceedings of the 21st annual international conference on Supercomputing
CacheScouts: Fine-Grain Monitoring of Shared Caches in CMP Platforms
PACT '07 Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques
Adaptive set pinning: managing shared caches in chip multiprocessors
Proceedings of the 13th international conference on Architectural support for programming languages and operating systems
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Adaptive insertion policies for managing shared caches
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
RapidMRC: approximating L2 miss rate curves on commodity systems for online optimizations
Proceedings of the 14th international conference on Architectural support for programming languages and operating systems
Proceedings of the 41st annual IEEE/ACM International Symposium on Microarchitecture
PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches
Proceedings of the 36th annual international symposium on Computer architecture
Proceedings of the 36th annual international symposium on Computer architecture
High performance cache replacement using re-reference interval prediction (RRIP)
Proceedings of the 37th annual international symposium on Computer architecture
SHiP: signature-based hit predictor for high performance caching
Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture
Dynamic QoS management for chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
An application-aware cache replacement policy for last-level caches
ARCS'13 Proceedings of the 26th international conference on Architecture of Computing Systems
An effectiveness-based adaptive cache replacement policy
Microprocessors & Microsystems
Hi-index | 0.00 |
In chip multiprocessors (CMPs), several high-performance cores typically compete for capacity in a shared last-level cache. This causes degraded and unpredictable memory performance for multiprogrammed and parallel workloads. In response, recent schemes apportion cache bandwidth and capacity in ways that offer better aggregate performance for the workloads. These schemes, however, focus primarily on relatively coarse-grained capacity management without concern for operating system process priority levels. In this work, we explore capacity management approaches that are both temporally and spatially more fine-grained than prior work. We also consider operating system priority levels as part of capacity management. We propose a capacity management mechanism based on timekeeping techniques that track the time interval since the last access to cached data. This Adaptive Timekeeping Replacement (ATR) scheme maintains aggregate cache occupancies that reflect the priority and footprint of each application. The key novelties of our work are (1) ATR offers a complete cache capacity management framework taking into account application priorities and memory characteristics, and (2) ATR's fine-grained cache capacity control is demonstrated to be effective and important in improving the performance of parallel workloads in addition to sequential ones. We evaluate our ideas using a full-system simulator and multiprogrammed workloads of both sequential and parallel applications. This is the first detailed study of shared cache capacity management considering thread behaviors in parallel applications. ATR outperforms an unmanaged system by as much as 1.63X and by an average of 1.19X. ATR's fine-grained temporal control is particularly important for parallel applications, which are expected to be increasingly prevalent in years to come.