Reducing cache traffic and energy with macro data load
Proceedings of the 2006 international symposium on Low power electronics and design
Hi-index | 0.00 |
The latency of an L1 data cache continues to grow with increasing clock frequency, cache size and associativity. The increased latency is an important source of performance loss in high-performance processors. This paper proposes to cache data utilizing the Load-Store Queue (LSQ) hardware and data paths. Using very little additional hardware this inexpensive cache improves performance and reduces energy consumption. The modified Load/Store Queue "caches" all previously accessed data values going beyond existing store-to-load forwarding techniques. Both load and store data are placed in the LSQ and is retained there after a corresponding memory access instruction has been committed. It is shown that a 128-entry modified LSQ design allows an average of 51% of all loads in the SpecINT2000 benchmarks to get their data from the LSQ. Up to 7% performance improvement is achieved on SPECInt2000 with a 1-cycle LSQ access latency and 3-cycle L1 cache latency. The average speedup is over 4%.