Guided region prefetching: a cooperative hardware/software approach

  • Authors:
  • Zhenlin Wang;Doug Burger;Kathryn S. McKinley;Steven K. Reinhardt;Charles C. Weems

  • Affiliations:
  • Univ. of Massachusetts, Amherst;The University of Texas at Austin;The University of Texas at Austin;University of Michigan;Univ. of Massachusetts, Amherst

  • Venue:
  • Proceedings of the 30th annual international symposium on Computer architecture
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Despite large caches, main-memory access latencies still cause significant performance losses in many applications. Numerous hardware and software prefetching schemes have been proposed to tolerate these latencies. Software prefetching typically provides better prefetch accuracy than hardware, but is limited by prefetch instruction overheads and the compiler's limited ability to schedule prefetches sufficiently far in advance to cover level-two cache miss latencies. Hardware prefetching can be effective at hiding these large latencies, but generates many useless prefetches and consumes considerable memory bandwidth. In this paper, we propose a cooperative hardware-software prefetching scheme called Guided Region Prefetching (GRP), which uses compiler-generated hints encoded in load instructions to regulate an aggressive hardware prefetching engine. We compare GRP against a sophisticated pure hardware stride prefetcher and a scheduled region prefetching (SRP) engine. SRP and GRP show the best performance, with respective 22% and 21% gains over no prefetching, but SRP incurs 180% extra memory traffic---nearly tripling bandwidth requirements. GRP achieves performance close to SRP, but with a mere eighth of the extra prefetching traffic, a 23% increase over no prefetching. The GRP hardware-software collaboration thus combines the accuracy of compilerbased program analysis with the performance potential of aggressive hardware prefetching, bringing the performance gap versus a perfect L2 cache under 20%.