Reducing Memory Latency via Read-after-Read Memory Dependence Prediction

  • Authors:
  • Andreas Moshovos;Gurindar S. Sohi

  • Affiliations:
  • Univ. of Toronto, Ontario, Canada;Univ. of Wisconsin-Madison

  • Venue:
  • IEEE Transactions on Computers
  • Year:
  • 2002

Quantified Score

Hi-index 14.98

Visualization

Abstract

We observe that typical programs exhibit highly regular read-after-read (RAR) memory dependence streams. To exploit this regularity, we introduce read-after-read (RAR) memory dependence prediction. This technique predicts whether: 1) A load will access a memory location that a preceding load accesses and 2) exactly which this preceding load is. This prediction is done without actual knowledge of the corresponding memory addresses. We also present two techniques that utilize RAR memory dependence prediction to reduce memory latency. In the first technique, a load may obtain a value by naming a preceding load with which an RAR dependence is predicted. The second technique speculatively converts a series of ${\rm LOAD}_1{\hbox{-}}{\rm USE}_1,\ldots,{\rm LOAD_N}{\hbox{-}}{\rm USE_N}$ chains into a single ${\rm LOAD}_1{\hbox{-}}{\rm USE}_1\ldots{\rm USE_N}$ producer/consumer graph. This is done whenever RAR dependences are predicted among the ${\rm LOAD_i}$ instructions. Our techniques can be implemented as small extensions to the previously proposed read-after-write (RAW) dependence prediction-based speculative memory cloaking and speculative memory bypassing. On average, our RAR-based techniques provide correct values for an additional 20 percent (integer codes) and 30 percent (floating-point codes) of all loads. Moreover, a combined RAW- and RAR-based cloaking/bypassing mechanism improves performance by 6.44 percent (integer) and 4.66 percent (floating-point) over a highly aggressive dynamically scheduled superscalar processor that uses naive memory dependence speculation. By comparison, the original RAW-based cloaking/bypassing mechanism yields improvements of 4.28 percent (integer) and 3.20 percent (floating-point). When no memory dependence speculation is used, our techniques yield speedups of 9.85 percent (integer) and 6.14 percent (floating-point).