SOSP '87 Proceedings of the eleventh ACM Symposium on Operating systems principles
Reducing Null Messages in Misra's Distributed Discrete Event Simulation Method
IEEE Transactions on Software Engineering
Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
The SPLASH-2 programs: characterization and methodological considerations
ISCA '95 Proceedings of the 22nd annual international symposium on Computer architecture
Parallel simulation of chip-multiprocessor architectures
ACM Transactions on Modeling and Computer Simulation (TOMACS)
An Efficient, Practical Parallelization Methodology for Multicore Architecture Simulation
IEEE Computer Architecture Letters
Distributed Simulation: A Case Study in Design and Verification of Distributed Programs
IEEE Transactions on Software Engineering
An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors
HPCA '07 Proceedings of the 2007 IEEE 13th International Symposium on High Performance Computer Architecture
An Adaptive Synchronization Technique for Parallel Simulation of Networked Clusters
ISPASS '08 Proceedings of the ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software
A NUCA Substrate for Flexible CMP Cache Sharing
IEEE Transactions on Parallel and Distributed Systems
On the simulation of large-scale architectures using multiple application abstraction levels
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
VSim: Simulating multi-server setups at near native hardware speed
ACM Transactions on Architecture and Code Optimization (TACO) - HIPEAC Papers
CRAW/P: a workload partition method for the efficient parallel simulation of manycores
Euro-Par'12 Proceedings of the 18th international conference on Parallel Processing
A survey on cache tuning from a power/energy perspective
ACM Computing Surveys (CSUR)
An early memory hierarchy evaluation simulator for multimedia applications
Microprocessors & Microsystems
Hi-index | 0.00 |
The fast simulation of chip multiprocessors (CMPs) presents a critical challenge to the architecture research community as both industry and academia shift their research focus to multicore design. Parallel simulation is a technique to accelerate microarchitecture simulation of CMPs by exploiting the inherent parallelism of CMPs. In this paper, we explore the simulation paradigm of simulating each core of a target CMP in one thread and then spreading the threads across the hardware thread contexts of a host CMP. We implement several parallel simulation schemes using POSIX Threads (Pthreads). We start with cycle-by-cycle simulation and then relax the synchronization condition in various schemes, which we call slack simulations. In slack simulations, the Pthreads simulating different simulated cores do not synchronize after each simulated cycle, but rather they are given some slack. The slack is the difference in cycle between the simulated times of any two target cores. Small slacks, such as a few cycles, greatly improve the efficiency of parallel CMP simulations, with no or negligible simulation error. We have developed a simulation framework called SlackSim to experiment with various slack simulation schemes. Unlike previous attempts to parallelize multiprocessor simulations on distributed memory machines, SlackSim takes advantage of the efficient sharing of data in the host CMP architecture. We demonstrate the efficiency and accuracy of some well known slack simulation schemes and of some new ones on SlackSim running on a state-of-the-art CMP platform.