Studying the impact of hardware prefetching and bandwidth partitioning in chip-multiprocessors

  • Authors:
  • Fang Liu;Yan Solihin

  • Affiliations:
  • North Carolina State University, Raleigh, NC, USA;North Carolina State University, Raleigh, NC, USA

  • Venue:
  • Proceedings of the ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern high performance microprocessors widely employ hardware prefetching technique to hide long memory access latency. While very useful, hardware prefetching tends to aggravate the bandwidth wall, a problem where system performance is increasingly limited by the availability of the off-chip pin bandwidth in Chip Multi-Processors (CMPs). In this paper, we propose an analytical model-based study to investigate how hardware prefetching and memory bandwidth partitioning impact CMP system performance and how they interact. The model includes a composite prefetching metric that can help determine under which conditions prefetching can improve system performance, a bandwidth partitioning model that takes into account prefetching effects, and a derivation of the weighted speedup optimum bandwidth partition sizes for different cores. Through model-driven case studies, we find several interesting observations that can be valuable for future CMP system design and optimization. We also explore simulation-based empirical evaluation to validate the observations and show that maximum system performance can be achieved by selective prefetching, guided by the composite prefetching metric, coupled with dynamic bandwidth partitioning.