Tackling cache-line stealing effects using run-time adaptation

  • Authors:
  • Stéphane Zuckerman;William Jalby

  • Affiliations:
  • University of Versailles Saint-Quentin-en-Yvelines, France;University of Versailles Saint-Quentin-en-Yvelines, France

  • Venue:
  • LCPC'10 Proceedings of the 23rd international conference on Languages and compilers for parallel computing
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Modern multicore processors are now found in mainstream systems, as well as supercomputers. They usually embed prefetching facilities to hide memory stalls. While very useful in general, there are some cases where such mechanisms can actually hamper performance, as is the case with cache-line stealing. This paper characterizes and quantifies cache-line stealing, and shows it can induce huge slowdowns - down to almost 65%. Several solutions are examined, ranging from deactivation of hardware prefetching to array reshaping. Such solutions bring between 10% and 65% speedups in the best cases. In order to apply these transformations where they are relevant, we use run-time measurements and adaptive methods to generate code wrappers to be used only when prefetching hurts performance.