Fast and cycle-accurate modeling of a multicore processor

  • Authors:
  • Asif Khan;Muralidaran Vijayaraghavan;Silas Boyd-Wickizer; Arvind

  • Affiliations:
  • Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA

  • Venue:
  • ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

An ideal simulator allows an architect to swiftly explore design alternatives and accurately determine their impact on performance. Design exploration requires simulators to be easily modifiable, and accurate performance estimates require detailed models. Unfortunately, detailed modeling not only impacts the ease with which a simulator can be modified, but also the speed at which it can be executed, resulting in fidelity being traded for simulation speed. Although FPGA-based simulators have dramatically higher speed than software simulators, sacrificing fidelity is still common. In this paper we present Arete, an FPGA-based processor simulator, which offers high performance along with accuracy and modifiability. We begin with a cycle-level specification of a multicore architecture which includes realistic in-order cores and detailed models of shared, coherent memory and on-chip network. We then describe how this specification is implemented faithfully and efficiently on FPGAs. Arete delivers a performance of up to 11 MIPS per core. We run a subset of the PARSEC benchmark suite on top of off-the-shelf SMP Linux, and achieve an average performance of 55 MIPS for an 8-core model.We also describe two significant architectural explorations: one involving three different branch predictors and the other requiring major modifications to the cache-coherence protocol.