An Integrated Hardware/Software Data Prefetching Scheme for Shared-Memory Multiprocessors

  • Authors:
  • Edward H. Gornish;Alexander Veidenbaum

  • Affiliations:
  • -;-

  • Venue:
  • International Journal of Parallel Programming
  • Year:
  • 1999

Quantified Score

Hi-index 0.00

Visualization

Abstract

Both hardware and software prefetching have been shown to be effective in tolerating the large memory latencies inherent in shared-memory multiprocessors; however, both types of prefetching have their shortcomings. While software schemes require less hardware support than hardware schemes, they must generate address calculation instructions and a prefetch instruction for each datum that needs to be prefetched. Hardware schemes, however, must become progressively more complex to be able to compute data access strides and to increase the prefetching lookahead. In this paper, we propose an integrated hardware/software prefetching method that uses simple hardware that can handle most data accesses and software prefetching for the few remaining accesses. A compile time algorithm analyzes the access streams formed by array references and determines sequences of consecutive memory accesses to an access stream that can be prefetched by the hardware mechanism. This analysis is based on the relative memory locations of consecutive accesses to an access stream and the number of intervening data references between consecutive accesses to an access stream. In addition, the prefetching lookahead can be set separately for each access stream. Our approach yields an effective scheme that minimizes both CPU overhead and hardware costs. Execution-driven simulations show our method to be very effective.