Accelerated warmup for sampled microarchitecture simulation

  • Authors:
  • John W. Haskins, Jr.;Kevin Skadron

  • Affiliations:
  • Center for Computing Sciences, Bowie, MD;University of Virginia, Charlottesville, VA

  • Venue:
  • ACM Transactions on Architecture and Code Optimization (TACO)
  • Year:
  • 2005

Quantified Score

Hi-index 0.00

Visualization

Abstract

To reduce the cost of cycle-accurate software simulation of microarchitectures, many researchers use statistical sampling: by simulating only a small, representative subset of the end-to-end dynamic instruction stream in cycle-accurate detail, simulation results complete in much less time than simulating the cycle-by-cycle progress of an entire benchmark. In order for sampled simulation results to accurately reflect the nature the full dynamic instruction stream, however, state in the simulated cache and branch predictor must match or closely approximate state as it would have appeared had cycle-accurate simulation been used for the entire simulation. Researchers typically address this issue by prefixing a period of warmup---in which cache and branch predictor state are modeled in addition to programmer-visible architected state---to each cluster of contiguous instructions in the sample.One conservative, but slow approach is to always simulate cache and branch predictor state, whether among the cycle-accurate clusters, or among the instructions preceding each cluster. To save time, warmup heuristics have been proposed, but there is no one-size-fits-all heuristic for any benchmark. More rigorous, analytical warmup approaches are necessary in order to balance the requirements of high accuracy and rapidity from sampled simulations. This paper explores this issue and in particular demonstrates the merits of memory reference reuse latency (MRRL).Relative to the IPC measured by modeling all precluster cache and branch predictor activity, MRRL generated an average error in IPC of less than 1% and simultaneously reduced simulation running times by an average of approximately 50% (or 95% of the maximum potential speedup).