A mechanistic performance model for superscalar out-of-order processors

  • Authors:
  • Stijn Eyerman;Lieven Eeckhout;Tejas Karkhanis;James E. Smith

  • Affiliations:
  • Ghent University, Ghent, Belgium;Ghent University, Ghent, Belgium;Advanced Micro Devices, Sunnyvale, CA;University of Wisconsin -- Madison, Madison, WI

  • Venue:
  • ACM Transactions on Computer Systems (TOCS)
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

A mechanistic model for out-of-order superscalar processors is developed and then applied to the study of microarchitecture resource scaling. The model divides execution time into intervals separated by disruptive miss events such as branch mispredictions and cache misses. Each type of miss event results in characterizable performance behavior for the execution time interval. By considering an interval's type and length (measured in instructions), execution time can be predicted for the interval. Overall execution time is then determined by aggregating the execution time over all intervals. The mechanistic model provides several advantages over prior modeling approaches, and, when estimating performance, it differs from detailed simulation of a 4-wide out-of-order processor by an average of 7%. The mechanistic model is applied to the general problem of resource scaling in out-of-order superscalar processors. First, we use the model to determine size relationships among microarchitecture structures in a balanced processor design. Second, we use the mechanistic model to study scaling of both pipeline depth and width in balanced processor designs. We corroborate previous results in this area and provide new results. For example, we show that at optimal design points, the pipeline depth times the square root of the processor width is nearly constant. Finally, we consider the behavior of unbalanced, overprovisioned processor designs based on insight gained from the mechanistic model. We show that in certain situations an overprovisioned processor may lead to improved overall performance. Designs where a processor's dispatch width is wider than its issue width are of particular interest.