Efficient simulation of trace samples on parallel machines

  • Authors:
  • Lieven Eeckhout;Koen De Bosschere

  • Affiliations:
  • Department of Electronics and Information Systems (ELIS), Ghent University, Sint-Pietersnieuwstraat 41, Ghent B-9000, Belgium;Department of Electronics and Information Systems (ELIS), Ghent University, Sint-Pietersnieuwstraat 41, Ghent B-9000, Belgium

  • Venue:
  • Parallel Computing
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Architectural simulations of microprocessors are extremely time-consuming nowadays due to the ever increasing complexity of current applications. In order to get realistic workloads for the architectural simulations, benchmarks need to constructed with huge dynamic instruction counts. For example, SPEC released the CPU2000 benchmark suite containing benchmarks that have a dynamic instruction count of several hundreds of billions of instructions. This is beneficial for real hardware evaluation. However, simulating these workloads is impractical if not impossible if we take into account that many simulation runs are needed in order to evaluate a large number of design points. Trace sampling is often proposed as a practical solution for this problem. In trace sampling, several representative samples are chosen from a real program trace. Since the sampled trace is much shorter than the original trace, a significant speedup is obtained. In this paper, we study how parallel processing can speedup the simulation of the sampled traces even further. Therefore, we propose and evaluate two ways of distributing sampled traces over parallel machines while taking into account the additional overhead due to the cold-start problem associated with trace sampling. We conclude that in many (practical) trace sampling scenarios, 'distributing samples' is more efficient than 'distributing traces'. This information will help researchers and computer designers in speeding up their simulation runs.