The Performance of Runtime Data Cache Prefetching in a Dynamic Optimization System

  • Authors:
  • Jiwei Lu;Howard Chen;Rao Fu;Wei-Chung Hsu;Bobbie Othmer;Pen-Chung Yew;Dong-Yuan Chen

  • Affiliations:
  • Department of Computer Science and Engineering, University of Minnesota, Twin Cities;Microprocessor Research Lab, Intel Corporation;-;-;-;-;-

  • Venue:
  • Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

Traditional software controlled data cache prefetching isoften ineffective due to the lack of runtime cache miss andmiss address information. To overcome this limitation, weimplement runtime data cache prefetching in the dynamicoptimization system ADORE (ADaptive Object code RE-optimization).Its performance has been compared withstatic software prefetching on the SPEC2000 benchmarksuite. Runtime cache prefetching shows better performance.On an Itanium 2 based Linux workstation, it can increaseperformance by more than 20% over static prefetching onsome benchmarks. For benchmarks that do not benefit fromprefetching, the runtime optimization system adds only 1%-2%overhead. We have also collected cache miss profiles toguide static data cache prefetching in the ORC® compiler.With that information the compiler can effectively avoidgenerating prefetches for loops that hit well in the datacache.