Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
Complete Computer System Simulation: The SimOS Approach
IEEE Parallel & Distributed Technology: Systems & Technology
Modeling Speedup (n) Greater than n
IEEE Transactions on Parallel and Distributed Systems
SHARED VIRTUAL MEMORY AND GENERALIZED SPEEDUP
SHARED VIRTUAL MEMORY AND GENERALIZED SPEEDUP
Simulation of Computer Architectures: Simulators, Benchmarks, Methodologies, and Recommendations
IEEE Transactions on Computers
Parallel and distributed simulation: traditional techniques and recent advances
Proceedings of the 38th conference on Winter simulation
QEMU, a fast and portable dynamic translator
ATEC '05 Proceedings of the annual conference on USENIX Annual Technical Conference
FPGA-Accelerated Simulation Technologies (FAST): Fast, Full-System, Cycle-Accurate Simulators
Proceedings of the 40th Annual IEEE/ACM International Symposium on Microarchitecture
Parallelization of IBM mambo system simulator in functional modes
ACM SIGOPS Operating Systems Review
COTSon: infrastructure for full system simulation
ACM SIGOPS Operating Systems Review
ProtoFlex: Towards Scalable, Full-System Multiprocessor Simulations Using FPGAs
ACM Transactions on Reconfigurable Technology and Systems (TRETS)
Parallelizing SystemC Kernel for Fast Hardware Simulation on SMP Machines
PADS '09 Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation
CRAW/P: a workload partition method for the efficient parallel simulation of manycores
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
Hi-index | 0.00 |
Multi-core processors are commonly available now, but most traditional computer architectural simulators still use single-thread execution. In this paper we use parallel discrete event simulation (PDES) to speedup a cycle-accurate event-driven many-core processor simulator. Evaluation against the sequential version shows that the parallelized one achieves an average speedup of 10.9x (up to 13.6x) running SPLASH-2 kernel on a 16-core host machine, with cycle counter differences of less than 0.1%. Moreover, super-linear speedups are achieved between running 1 thread and 8 threads due to reduced overhead of insert-event-to-queue time and increased cache size in parallel processing. We conclude that PDES could be an attractive option for achieving fast cycle-accurate many-core processor simulations.