Hardware Support for Prescient Instruction Prefetch

  • Authors:
  • Tor M. Aamodt;Paul Chow;Per Hammarlund;Hong Wang;John P. Shen

  • Affiliations:
  • University of Toronto;University of Toronto;Intel Corporation;Intel Corporation;Intel Corporation

  • Venue:
  • HPCA '04 Proceedings of the 10th International Symposium on High Performance Computer Architecture
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes and evaluates hardware mechanisms for supporting prescient instruction prefetch 驴 an approach to improving single-threaded application performance by using helper threads to perform instruction prefetch. We demonstrate the need for enabling store-to-load communication and selective instruction execution when directly pre-executing future regions of an application that suffer I-cache misses. Two novel hardware mechanisms, safe-store and YAT-bits, are introduced that help satisfy these requirements. This paper also proposes and evaluates .nite state machine recall, a technique for limiting pre-execution to branches that are hard to predict by leveraging a counted I-prefetch mechanism. On a research Itanium®SMT processor with next line and streaming I-prefetch mechanisms that incurs latencies representative of next generation processors, prescient instruction prefetch can improve performance by an average of 10.0% to 22% on a set of SPEC 2000 benchmarks that suffer significant I-cache misses. Prescient instruction prefetch is found to be competitive against even the most aggressive research hardware instruction prefetch technique: fetch directed instruction prefetch.