Complexity/performance tradeoffs with non-blocking loads

  • Authors:
  • K. I. Farkas;N. P. Jouppi

  • Affiliations:
  • Dept. of Electrical and Computer Engineering, University of Toronto, 10 Kings College Rd., Toronto Ontario Canada, M5S 1A4;Digital Equipment Corporation Western Research Lab, 250 University Avenue, Palo Alto, CA

  • Venue:
  • ISCA '94 Proceedings of the 21st annual international symposium on Computer architecture
  • Year:
  • 1994

Quantified Score

Hi-index 0.01

Visualization

Abstract

Non-blocking loads are a very effective technique for tolerating the cache-miss latency on data cache references. In this paper, we describe several methods for implementing non-blocking loads. A range of resulting hardware complexity/performance tradeoffs are investigated using an object-code translation and instrumentation system. We have investigated the SPEC92 benchmarks and have found that for the integer benchmarks, a simple hit-under-miss implementation achieves almost all of the available performance improvement for relatively little cost. However, for most of the numeric benchmarks, more expensive implementations are worthwhile. The results also point out the importance of using a compiler capable of scheduling load instructions for cache misses rather than cache hits in non-blocking systems.