Fast and cycle-accurate modeling of a multicore processor

Authors:
Asif Khan;Muralidaran Vijayaraghavan;Silas Boyd-Wickizer; Arvind
Affiliations:
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA;Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology, Cambridge, USA
Venue:
ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
Year:
2012

Citing 0
Cited 1

ZSim: fast and accurate microarchitectural simulation of thousand-core systems

Proceedings of the 40th Annual International Symposium on Computer Architecture

Quantified Score

Hi-index	0.00

Visualization

Abstract

An ideal simulator allows an architect to swiftly explore design alternatives and accurately determine their impact on performance. Design exploration requires simulators to be easily modifiable, and accurate performance estimates require detailed models. Unfortunately, detailed modeling not only impacts the ease with which a simulator can be modified, but also the speed at which it can be executed, resulting in fidelity being traded for simulation speed. Although FPGA-based simulators have dramatically higher speed than software simulators, sacrificing fidelity is still common. In this paper we present Arete, an FPGA-based processor simulator, which offers high performance along with accuracy and modifiability. We begin with a cycle-level specification of a multicore architecture which includes realistic in-order cores and detailed models of shared, coherent memory and on-chip network. We then describe how this specification is implemented faithfully and efficiently on FPGAs. Arete delivers a performance of up to 11 MIPS per core. We run a subset of the PARSEC benchmark suite on top of off-the-shelf SMP Linux, and achieve an average performance of 55 MIPS for an 8-core model.We also describe two significant architectural explorations: one involving three different branch predictors and the other requiring major modifications to the cache-coherence protocol.