How to simulate 1000 cores

  • Authors:
  • Matteo Monchiero;Jung Ho Ahn;Ayose Falcón;Daniel Ortega;Paolo Faraboschi

  • Affiliations:
  • Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories

  • Venue:
  • ACM SIGARCH Computer Architecture News
  • Year:
  • 2009

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper proposes a novel methodology to efficiently simulate shared-memory multiprocessors composed of hundreds of cores. The basic idea is to use thread-level parallelism in the software system and translate it into corelevel parallelism in the simulated world. To achieve this, we first augment an existing full-system simulator to identify and separate the instruction streams belonging to the different software threads. Then, the simulator dynamically maps each instruction flow to the corresponding core of the target multi-core architecture, taking into account the inherent thread synchronization of the running applications. Our simulator allows a user to execute any multithreaded application in a conventional full-system simulator and evaluate the performance of the application on a many-core hardware. We carried out extensive simulations on the SPLASH-2 benchmark suite and demonstrated the scalability up to 1024 cores with limited simulation speed degradation vs. the single-core case on a fixed workload. The results also show that the proposed technique captures the intrinsic behavior of the SPLASH-2 suite, even when we scale up the number of shared-memory cores beyond the thousand-core limit.