Representative Multiprogram Workloads for Multithreaded Processor Simulation

  • Authors:
  • Michael Van Biesbrouck;Lieven Eeckhout;Brad Calder

  • Affiliations:
  • CSE, University of California, San Diego, USA. Email: mvanbies@cs.ucsd.edu;ELIS, Ghent University, Belgium. Email: leeckhou@elis.UGent.be;Microsoft. Email: calder@cs.ucsd.edu

  • Venue:
  • IISWC '07 Proceedings of the 2007 IEEE 10th International Symposium on Workload Characterization
  • Year:
  • 2007

Quantified Score

Hi-index 0.00

Visualization

Abstract

Almost all new consumer-grade processors are capable of executing multiple programs simultaneously. The analysis of multiprogrammed workloads for multicore and SMT processors is challenging and time-consuming because there are many possible combinations of benchmarks to execute and each combination may exhibit several different interesting behaviors. Missing particular combinations of program behaviors could hide performance problems with designs. It is thus of utmost importance to have a representative multiprogrammed workload when evaluating multithreaded processor designs. This paper presents a methodology that uses phase analysis, principal components analysis (PCA) and cluster analysis (CA) applied to microarchitecture-independent program characteristics in order to find important program interactions in multiprogrammed workloads. The end result is a small set of co-phases with associated weights that are representative for a multiprogrammed workload across multithreaded processor architectures. Applying our methodology to the SPEC CPU 2000 benchmark suite yields 50 distinct combinations for two-context multithreaded processor simulation that6 researchers and architects can use for simulation. Each combination is simulated for 50 million instructions, giving a total of 2.5 billion instructions to be simulated for the SPEC CPU2000 benchmark suite. The performance prediction error with these representative combinations is under 2.5% of the real workload for absolute throughput prediction and can be used to make relative throughput comparisons across processor architectures.