Profile-guided post-link stride prefetching

  • Authors:
  • Chi-Keung Luk;Robert Muth;Harish Patil;Richard Weiss;P. Geoffrey Lowney;Robert Cohn

  • Affiliations:
  • Intel Corporation,Shrewsbury, MA;Intel Corporation,Shrewsbury, MA;Intel Corporation,Shrewsbury, MA;Smith CollegeNorthampton, MA;Intel Corporation,Shrewsbury, MA;Intel Corporation,Shrewsbury, MA

  • Venue:
  • ICS '02 Proceedings of the 16th international conference on Supercomputing
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data prefetching is an effective approach to addressing the memory latency problem. While a few processors have implemented hardware-based data prefetching, the majority of modern processors support data-prefetch instructions and rely on compilers to automatically insert prefetches. However, most prefetching schemes in commercial compilers suffer from two limitations: (1) the source code must be available before prefetching can be applied, and (2) these prefetching schemes target only loops with statically-known strided accesses. In this study, we broaden the scope of software-controlled prefetching by addressing the above two limitations. We use profiling to discover strided accesses that frequently occur during program execution but are not determinable by the compiler. We then use the strides discovered to insert prefetches into the executable directly, without the need for re-compilation. Performance evaluation was done on an Alpha 21264-based system with a 64KB data cache and an 8MB secondary cache. We find that even with such large caches, our technique offers speedups ranging from 3% to 56% in 11 out of the 26 SPEC2000 benchmarks. Our technique has been incorporated into Pixie and Spike, two products in Compaq's Tru64 Unix.