Data prefetching and address pre-calculation through instruction pre-execution with two-step physical register deallocation

  • Authors:
  • Akihiro Yamamoto;Yusuke Tanaka;Hideki Ando;Toshio Shimada

  • Affiliations:
  • Nagoya University;Nagoya University;Nagoya University;Nagoya University

  • Venue:
  • MEDEA '07 Proceedings of the 2007 workshop on MEmory performance: DEaling with Applications, systems and architecture
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes an instruction pre-execution scheme that reduces latency and early scheduling of loads for a high performance processor. Our scheme exploits the difference between the available amount of instruction-level parallelism with an unlimited number of physical registers and that with an actual number of physical registers. We introduce a scheme called two-step physical register deallocation. Our scheme deallocates physical registers at the renaming stage as a first step, and eliminates pipeline stalls caused by a physical register shortage. Instructions wait for the final deallocation as a second step in the instruction window. While waiting, the scheme allows pre-execution of instructions. This enables prefetching of load data and early calculation of memory effective addresses. In particular, our execution-based scheme has the strength on prefetch of data with an irregular access pattern. Considering the strength of an automatic prefetcher for a regular access pattern, combining it with our scheme offers the best use of our scheme. The evaluation results show that the combined scheme significantly improve performance over a processor with an automatic prefetcher.