Store Memory-Level Parallelism Optimizations for Commercial Applications

  • Authors:
  • Yuan Chou;Lawrence Spracklen;Santosh G. Abraham

  • Affiliations:
  • Sun Microsystems;Sun Microsystems;Sun Microsystems

  • Venue:
  • Proceedings of the 38th annual IEEE/ACM International Symposium on Microarchitecture
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper studies the impact of off-chip store misses on processor performance for modern commercial applications. The performance impact of off-chip store misses is largely determined by the extent of their overlap with other off-chip cache misses. The epoch MLP model is used to explain and quantify how these overlaps are affected by various store handling optimizations and by the memory consistency model implemented by the processor. The extent of these overlaps are then translated to off-chip CPI. Experimental results show that store handling optimizations are crucial for mitigating the substantial performance impact of stores in commercial applications. While some previously proposed optimizations, such as store prefetching, are highly effective, they are unable to fully mitigate the performance impact of off-chip store misses and they also leave a performance gap between the stronger and weaker memory consistency models. New optimizations, such as the Store Miss Accelerator, an optimization of Hardware Scout and a new application of Speculative Lock Elision, are demonstrated to virtually eliminate the impact of off-chip store misses.