How to simulate 1000 cores

Authors:
Matteo Monchiero;Jung Ho Ahn;Ayose Falcón;Daniel Ortega;Paolo Faraboschi
Affiliations:
Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories;Hewlett-Packard Laboratories
Venue:
ACM SIGARCH Computer Architecture News
Year:
2009

Citing 20
Cited 7

A characterization of sharing in parallel programs and its application to coherency protocol evaluation

ISCA '88 Proceedings of the 15th Annual International Symposium on Computer architecture
On the validity of trace-driven simulation for multiprocessors

ISCA '91 Proceedings of the 18th annual international symposium on Computer architecture
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
The accuracy of trace-driven simulations of multiprocessors

SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
ATOM: a system for building customized program analysis tools

PLDI '94 Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation
The SPLASH-2 programs: characterization and methodological considerations

ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Full-system timing-first simulation

SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Scaling Parallel Programs for Multiprocessors: Methodology and Examples

Computer
Simics: A Full System Simulation Platform

Computer
Pin: building customized program analysis tools with dynamic instrumentation

Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Exploring the cache design space for large scale CMPs

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset

ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
Using Pin as a memory reference generator for multiprocessor simulation

ACM SIGARCH Computer Architecture News - Special issue on the 2005 workshop on binary instrumentation and application
Synergistic Processing in Cell's Multicore Architecture

IEEE Micro
The M5 Simulator: Modeling Networked Systems

IEEE Micro
A practical FPGA-based framework for novel CMP research

Proceedings of the 2007 ACM/SIGDA 15th international symposium on Field programmable gate arrays
RAMP: Research Accelerator for Multiple Processors

IEEE Micro
Exploring Large-Scale CMP Architectures Using ManySim

IEEE Micro
A complexity-effective architecture for accelerating full-system multiprocessor simulations using FPGAs

Proceedings of the 16th international ACM/SIGDA symposium on Field programmable gate arrays
COTSon: infrastructure for full system simulation

ACM SIGOPS Operating Systems Review

Boosting parallel applications performance on applying DIM technique in a multiprocessing environment

International Journal of Reconfigurable Computing - Special issue on selected papers from the 17th reconfigurable architectures workshop (RAW2010)
Filtering directory lookups in CMPs with write-through caches

Euro-Par'11 Proceedings of the 17th international conference on Parallel processing - Volume Part I
Filtering directory lookups in CMPs

Microprocessors & Microsystems
Fast architecture evaluation of heterogeneous MPSoCs by host-compiled simulation

Proceedings of the 15th International Workshop on Software and Compilers for Embedded Systems
ScalableCore system: a scalable many-core simulator by employing over 100 FPGAs

ARC'12 Proceedings of the 8th international conference on Reconfigurable Computing: architectures, tools and applications
CRAW/P: a workload partition method for the efficient parallel simulation of manycores

Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Prototyping hardware support for irregular applications

Proceedings of the 2013 Workshop on Rapid Simulation and Performance Evaluation: Methods and Tools

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper proposes a novel methodology to efficiently simulate shared-memory multiprocessors composed of hundreds of cores. The basic idea is to use thread-level parallelism in the software system and translate it into corelevel parallelism in the simulated world. To achieve this, we first augment an existing full-system simulator to identify and separate the instruction streams belonging to the different software threads. Then, the simulator dynamically maps each instruction flow to the corresponding core of the target multi-core architecture, taking into account the inherent thread synchronization of the running applications. Our simulator allows a user to execute any multithreaded application in a conventional full-system simulator and evaluate the performance of the application on a many-core hardware. We carried out extensive simulations on the SPLASH-2 benchmark suite and demonstrated the scalability up to 1024 cores with limited simulation speed degradation vs. the single-core case on a fixed workload. The results also show that the proposed technique captures the intrinsic behavior of the SPLASH-2 suite, even when we scale up the number of shared-memory cores beyond the thousand-core limit.