Characterization and dynamic mitigation of intra-application cache interference

  • Authors:
  • Carole-Jean Wu;Margaret Martonosi

  • Affiliations:
  • Depts. of Electrical Engineering and Computer Science, Princeton University;Depts. of Electrical Engineering and Computer Science, Princeton University

  • Venue:
  • ISPASS '11 Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Given the emerging dominance of CMPs, an important research problem concerns application memory performance in the face of deep memory hierarchies, where one or more caches are shared by several cores. In current systems, many factors can cause interference in the shared last-level cache (LLC). While predicting an application's memory performance is difficult enough in an idealized setup, it becomes even more complicated in real-machine environments in which interference can stem from operating system memory accesses, and even from an application's own prefetch requests and page table walks caused by TLB misses. This paper characterizes the degree by which intra-application interference factors such as page table walks and hardware prefetching influence performance. Using hardware performance counters on an Intel platform, we first characterize real-system LLC interference and show that application data memory references represent much less than half of the LLC misses, with hardware prefetching and page table walks causing considerable LLC interference. Based on these characterizations, we propose dynamic management methods to reduce intra-application interference. First, we evaluate a dynamic OS-reference-aware cache insertion policy that reduces interference and improves user IPCs by as much as 19% (5% on average). Second, to mitigate prefetch-induced LLC interference, we propose, implement, and evaluate an automatic prefetch manager that uses Intel PEBS capabilities to dynamically estimate prefetch-induced interference and accordingly adjust the aggressiveness of hardware prefetchers as programs run. Overall, our characterizations are important in highlighting the challenges of intra-application interference, and our hardware and software proposals offer significant solutions for addressing them.