Effective cache prefetching on bus-based multiprocessors

  • Authors:
  • Dean M. Tullsen;Susan J. Eggers

  • Affiliations:
  • University of Washington;University of Washington

  • Venue:
  • ACM Transactions on Computer Systems (TOCS)
  • Year:
  • 1995

Quantified Score

Hi-index 0.00

Visualization

Abstract

Compiler-directed cache prefetching has the potential to hide much of the high memory latency seen by current and future high-performance processors. However, prefetching is not without costs, particularly on a shared-memory multiprocessor. Prefetching can negatively affect bus utilization, overall cache miss rates, memory latencies and data sharing. We simulate the effects of a compiler-directed prefetching algorithm, running on a range of bus-based multiprocessors. We show that, despite a high memory latency, this architecture does not necessarily support prefetching well, in some cases actually causing performance degradations. We pinpoint several problems with prefetching on a shared-memory architecture (additional conflict misses, no reduction in the data-sharing traffic and associated latencies, a multiprocessor's greater sensitivity to memory utilization and the sensitivity of the cache hit rate to prefetch distance) and measure their effect on performance. We then solve those problems through architectural techniques and heuristics for prefetching that could be easily incorporated into a compiler: (1) victim caching, which eliminates most of the cache conflict misses caused by prefetching in a direct-mapped cache, (2) special prefetch algorithms for shared data, which significantly improve the ability of our basic prefetching algorithm to prefetch individual misses, and (3) compiler-based shared-data restructuring, which eliminates many of the invalidation misses the basic prefetching algorithm does not predict. The combined effect of these improvements is to make prefetching effective over a much wider range of memory architectures.