MLP-Aware Runahead Threads in a Simultaneous Multithreading Processor

  • Authors:
  • Kenzo Craeynest;Stijn Eyerman;Lieven Eeckhout

  • Affiliations:
  • Department of Electronics and Information Systems (ELIS), Ghent University, Belgium;Department of Electronics and Information Systems (ELIS), Ghent University, Belgium;Department of Electronics and Information Systems (ELIS), Ghent University, Belgium

  • Venue:
  • HiPEAC '09 Proceedings of the 4th International Conference on High Performance Embedded Architectures and Compilers
  • Year:
  • 2008
  • Efficient runahead threads

    Proceedings of the 19th international conference on Parallel architectures and compilation techniques

Quantified Score

Hi-index 0.00

Visualization

Abstract

Threads experiencing long-latency loads on a simultaneous multith- reading (SMT) processor may clog shared processor resources without making forward progress, thereby starving other threads and reducing overall system throughput. An elegant solution to the long-latency load problem in SMT processors is to employ runahead execution. Runahead threads do not block commit on a long-latency load but instead execute subsequent instructions in a speculative execution mode to expose memory-level parallelism (MLP) through prefetching. The key benefit of runahead SMT threads is twofold: (i) runahead threads do not clog resources on a long-latency load, and (ii) runahead threads exploit far-distance MLP. This paper proposes MLP-aware runahead threads: runahead execution is only initiated in case there is far-distance MLP to be exploited. By doing so, useless runahead executions are eliminated, thereby reducing the number of speculatively executed instructions (and thus energy consumption) while preserving the performance of the runahead thread and potentially improving the performance of the co-executing thread(s). Our experimental results show that MLP-aware runahead threads reduce the number of speculatively executed instructions by 13.9% and 10.1% for two-program and four-program workloads, respectively, compared to MLP-agnostic runahead threads while achieving comparable system throughput and job turnaround time.