A framework for modeling and optimization of prescient instruction prefetch

  • Authors:
  • Tor M. Aamodt;Pedro Marcuello;Paul Chow;Antonio González;Per Hammarlund;Hong Wang;John P. Shen

  • Affiliations:
  • Intel Labs, Santa Clara, CA and University of Toronto, Canada;Universitat Politécnica de Catalunya, Spain;University of Toronto, Canada;Universitat Politécnica de Catalunya, Spain;Intel Corp., Hillsboro, OR;Intel Labs, Santa Clara, CA;Intel Labs, Santa Clara, CA

  • Venue:
  • SIGMETRICS '03 Proceedings of the 2003 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
  • Year:
  • 2003

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper describes a framework for modeling macroscopic program behavior and applies it to optimizing prescient instruction prefetch -- novel technique that uses helper threads to improve single-threaded application performance by performing judicious and timely instruction prefetch. A helper thread is initiated when the main thread encounters a spawn point, and prefetches instructions starting at a distant target point. The target identifies a code region tending to incur I-cache misses that the main thread is likely to execute soon, even though intervening control flow may be unpredictable. The optimization of spawn-target pair selections is formulated by modeling program behavior as a Markov chain based on profile statistics. Execution paths are considered stochastic outcomes, and aspects of program behavior are summarized via path expression mappings. Mappings for computing reaching, and posteriori probability; path length mean, and variance; and expected path footprint are presented. These are used with Tarjan's fast path algorithm to efficiently estimate the benefit of spawn-target pair selections. Using this framework we propose a spawn-target pair selection algorithm for prescient instruction prefetch. This algorithm has been implemented, and evaluated for the Itanium Processor Family architecture. A limit study finds 4.8%to 17% speedups on an in-order simultaneous multithreading processor with eight contexts, over nextline and streaming I-prefetch for a set of benchmarks with high I-cache miss rates. The framework in this paper is potentially applicable to other thread speculation techniques.