Hardware-only stream prefetching and dynamic access ordering

  • Authors:
  • Chengqiang Zhang;Sally A. McKee

  • Affiliations:
  • Department of Computer Science, University of Utah, Salt Lake City, UT;Department of Computer Science, University of Utah, Salt Lake City, UT

  • Venue:
  • Proceedings of the 14th international conference on Supercomputing
  • Year:
  • 2000

Quantified Score

Hi-index 0.00

Visualization

Abstract

Memory system bottlenecks limit performance for many applications, and computations with strided access patterns are among the hardest hit. The streams used in such applications have extremely poor cache behavior. These access patterns have the advantage of being predictable, though, and this can be exploited to improve the efficiency of the memory subsystem in two ways: memory latencies can be masked by prefetching stream data, and the latencies can be reduced by reordering stream accesses to exploit parallelism and locality within the DRAMs. Many researchers have studied hardware prefetching in its various forms. Others have examined dynamic memory scheduling to help bridge the performance gap between processors and DRAM memory systems. This study builds on these results, combining a stride-based reference prediction table, a mechanism that prefetches L2 cache lines, and a memory controller that dynamically schedules accesses to a Direct Rambus memory subsystem. We find that such a system delivers good speedups for scientific applications with regular access patterns without negatively affecting the performance of nonstreaming programs.