Caching Values in the Load Store Queue

  • Authors:
  • Dan Nicolaescu;Alex Veidenbaum;Alex Nicolau

  • Affiliations:
  • University of California at Irvine;University of California at Irvine;University of California at Irvine

  • Venue:
  • MASCOTS '04 Proceedings of the The IEEE Computer Society's 12th Annual International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunications Systems
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

The latency of an L1 data cache continues to grow with increasing clock frequency, cache size and associativity. The increased latency is an important source of performance loss in high-performance processors. This paper proposes to cache data utilizing the Load-Store Queue (LSQ) hardware and data paths. Using very little additional hardware this inexpensive cache improves performance and reduces energy consumption. The modified Load/Store Queue "caches" all previously accessed data values going beyond existing store-to-load forwarding techniques. Both load and store data are placed in the LSQ and is retained there after a corresponding memory access instruction has been committed. It is shown that a 128-entry modified LSQ design allows an average of 51% of all loads in the SpecINT2000 benchmarks to get their data from the LSQ. Up to 7% performance improvement is achieved on SPECInt2000 with a 1-cycle LSQ access latency and 3-cycle L1 cache latency. The average speedup is over 4%.