Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
The Wisconsin Wind Tunnel: virtual prototyping of parallel computers
SIGMETRICS '93 Proceedings of the 1993 ACM SIGMETRICS conference on Measurement and modeling of computer systems
U-Net: a user-level network interface for parallel and distributed computing
SOSP '95 Proceedings of the fifteenth ACM symposium on Operating systems principles
Embra: fast and flexible machine simulation
Proceedings of the 1996 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Optimistic simulation of parallel architectures using program executables
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Fast out-of-order processor simulation using memoization
Proceedings of the eighth international conference on Architectural support for programming languages and operating systems
Facile: a language and compiler for high-performance processor simulators
Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation
Full-system timing-first simulation
SIGMETRICS '02 Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems
Automatically characterizing large scale program behavior
Proceedings of the 10th international conference on Architectural support for programming languages and operating systems
Parallel simulation of chip-multiprocessor architectures
ACM Transactions on Modeling and Computer Simulation (TOMACS)
SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling
Proceedings of the 30th annual international symposium on Computer architecture
Pin: building customized program analysis tools with dynamic instrumentation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset
ACM SIGARCH Computer Architecture News - Special issue: dasCMP'05
The M5 Simulator: Modeling Networked Systems
IEEE Micro
FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
The PARSEC benchmark suite: characterization and architectural implications
Proceedings of the 17th international conference on Parallel architectures and compilation techniques
Two hardware-based approaches for deterministic multiprocessor replay
Communications of the ACM - One Laptop Per Child: Vision vs. Reality
An analysis of on-chip interconnection networks for large-scale chip multiprocessors
ACM Transactions on Architecture and Code Optimization (TACO)
RAMP gold: an FPGA-based architecture simulator for multiprocessors
Proceedings of the 47th Design Automation Conference
ATAC: a 1000-core cache-coherent processor with on-chip optical network
Proceedings of the 19th international conference on Parallel architectures and compilation techniques
Tessellation: space-time partitioning in a manycore client OS
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Lithe: enabling efficient composition of parallel libraries
HotPar'09 Proceedings of the First USENIX conference on Hot topics in parallelism
Corey: an operating system for many cores
OSDI'08 Proceedings of the 8th USENIX conference on Operating systems design and implementation
Adaptive and Speculative Slack Simulations of CMPs on CMPs
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
The ZCache: Decoupling Ways and Associativity
MICRO '43 Proceedings of the 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture
DRAMSim2: A Cycle Accurate Memory System Simulator
IEEE Computer Architecture Letters
Vantage: scalable and efficient fine-grain cache partitioning
Proceedings of the 38th annual international symposium on Computer architecture
HAsim: FPGA-based high-detail multicore simulation using time-division multiplexing
HPCA '11 Proceedings of the 2011 IEEE 17th International Symposium on High Performance Computer Architecture
ACM SIGARCH Computer Architecture News
MARSS: a full system simulator for multicore x86 CPUs
Proceedings of the 48th Design Automation Conference
Sniper: exploring the level of abstraction for scalable and accurate parallel multi-core simulation
Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis
Dynamic Fine-Grain Scheduling of Pipeline Parallelism
PACT '11 Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques
Improving network connection locality on multicore systems
Proceedings of the 7th ACM european conference on Computer Systems
SCD: A scalable coherence directory with flexible sharer set encoding
HPCA '12 Proceedings of the 2012 IEEE 18th International Symposium on High-Performance Computer Architecture
Fast and cycle-accurate modeling of a multicore processor
ISPASS '12 Proceedings of the 2012 IEEE International Symposium on Performance Analysis of Systems & Software
Towards energy-proportional datacenter memory with mobile DRAM
Proceedings of the 39th Annual International Symposium on Computer Architecture
Rethinking DRAM Power Modes for Energy Proportionality
MICRO-45 Proceedings of the 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture
Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures
HPCA '13 Proceedings of the 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA)
Jigsaw: scalable software-defined caches
PACT '13 Proceedings of the 22nd international conference on Parallel architectures and compilation techniques
Ubik: efficient cache sharing with strict qos for latency-critical workloads
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
SI-TM: reducing transactional memory abort rates through snapshot isolation
Proceedings of the 19th international conference on Architectural support for programming languages and operating systems
PCantorSim: Accelerating parallel architecture simulation through fractal-based sampling
ACM Transactions on Architecture and Code Optimization (TACO)
Hi-index | 0.00 |
Architectural simulation is time-consuming, and the trend towards hundreds of cores is making sequential simulation even slower. Existing parallel simulation techniques either scale poorly due to excessive synchronization, or sacrifice accuracy by allowing event reordering and using simplistic contention models. As a result, most researchers use sequential simulators and model small-scale systems with 16-32 cores. With 100-core chips already available, developing simulators that scale to thousands of cores is crucial. We present three novel techniques that, together, make thousand-core simulation practical. First, we speed up detailed core models (including OOO cores) with instruction-driven timing models that leverage dynamic binary translation. Second, we introduce bound-weave, a two-phase parallelization technique that scales parallel simulation on multicore hosts efficiently with minimal loss of accuracy. Third, we implement lightweight user-level virtualization to support complex workloads, including multiprogrammed, client-server, and managed-runtime applications, without the need for full-system simulation, sidestepping the lack of scalable OSs and ISAs that support thousands of cores. We use these techniques to build zsim, a fast, scalable, and accurate simulator. On a 16-core host, zsim models a 1024-core chip at speeds of up to 1,500 MIPS using simple cores and up to 300 MIPS using detailed OOO cores, 2-3 orders of magnitude faster than existing parallel simulators. Simulator performance scales well with both the number of modeled cores and the number of host cores. We validate zsim against a real Westmere system on a wide variety of workloads, and find performance and microarchitectural events to be within a narrow range of the real system.