Effective stream-based and execution-based data prefetching

  • Authors:
  • Sorin Iacobovici;Lawrence Spracklen;Sudarshan Kadambi;Yuan Chou;Santosh G. Abraham

  • Affiliations:
  • Sun Microsystems Inc., Sunnyvale, CA;Sun Microsystems Inc., Sunnyvale, CA;Sun Microsystems Inc., Sunnyvale, CA;Sun Microsystems Inc., Sunnyvale, CA;Sun Microsystems Inc., Sunnyvale, CA

  • Venue:
  • Proceedings of the 18th annual international conference on Supercomputing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

With processor speeds continuing to outpace the memory subsystem, cache missing memory operations continue to become increasingly important to application performance. In response to this continuing trend, most modern processors now support hardware (HW) prefetchers, which act to reduce the missing loads observed by an application.This paper analyzes the behavior of cache-missing loads in SPEC CPU2000 and highlights the inability of unit and single non-unit stride prefetchers to correctly prefetch for some commonly occurring streams. In response to this analysis, a novel multi-stride prefetcher, that supports streams with up to four distinct strides, is proposed. Performance analysis for SPEC CPU2000 illustrates that the proposed multi-stride prefetcher can outperform current stride prefetchers on several benchmarks; most notably on mcf, lucas and facerec, where it achieves an additional performance gain of up to 57%. Performance of the strided HW prefetchers is also contrasted with another recently proposed prefetch scheme, runahead execution (RAE), and the synergy between the schemes is investigated.