Coordinating prefetching and STT-RAM based last-level cache management for multicore systems

Authors:
Mengjie Mao;Hai (Helen) Li;Alex K. Jones;Yiran Chen
Affiliations:
University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA;University of Pittsburgh, Pittsburgh, PA, USA
Venue:
Proceedings of the 23rd ACM international conference on Great lakes symposium on VLSI
Year:
2013

Citing 8
Cited 0

Basic Block Distribution Analysis to Find Periodic Behavior and Simulation Points in Applications

Proceedings of the 2001 International Conference on Parallel Architectures and Compilation Techniques
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers

HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
Stall-Time Fair Memory Access Scheduling for Chip Multiprocessors

Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
System-Level Performance Metrics for Multiprogram Workloads

IEEE Micro
Hybrid cache architecture with disparate memory technologies

Proceedings of the 36th annual international symposium on Computer architecture
Coordinated control of multiple prefetchers in multi-core systems

Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture
Prefetch-aware shared resource management for multi-core systems

Proceedings of the 38th annual international symposium on Computer architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

Data prefetching is a common mechanism to mitigate the bottleneck of off-chip memory bandwidth in modern computing systems. Unfortunately, the side effects of prefetching are an additional burden on off-chip communication and increased cache write operations. With the proposal of spin-transfer torque random access memory (STT-RAM) based last-level caches (LLCs) for their high density and low power consumption, the increase of write pressure to the cache from prefetching coupled with the characteristically long write access compared with traditional SRAM caches exacerbates the performance cost of prefetching schemes. In this work, we propose two orthogonal techniques to reduce the negative performance impact induced by aggressive prefetching on multicore systems employing STT-RAM based LLC. First, basic priority assignment prioritizes the different types of access requests of LLC by their criticality and responds to them based on priority. Second, priority boosting differentiates requests by application and prioritizes the relatively few requests from applications with non-intensive accesses to the LLC, which usually creates the most severe performance degradation in multi-core systems. Combining these two prioritization policies can alleviate the negative effect induced by aggressive prefetching. Our results show that these techniques can achieve an 8.3 average application speedup compared to a baseline, prefetch only design without prioritization.