Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
Cloning: a novel method for interactive parallel simulation
Proceedings of the 29th conference on Winter simulation
Exploiting temporal uncertainty in parallel and distributed simulations
PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Theory of Modeling and Simulation
Theory of Modeling and Simulation
Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)
Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation
WNS2 '06 Proceeding from the 2006 workshop on ns-2: the IP network simulator
Parallel and distributed simulation: traditional techniques and recent advances
Proceedings of the 38th conference on Winter simulation
GPU accelerated molecular dynamics simulation of thermal conductivities
Journal of Computational Physics
GPU-Accelerated Evaluation Platform for High Fidelity Network Modeling
Proceedings of the 21st International Workshop on Principles of Advanced and Distributed Simulation
Proceedings of the 22nd Workshop on Principles of Advanced and Distributed Simulation
Logical Process Based Sequential Simulation Cloning
ANSS-41 '08 Proceedings of the 41st Annual Simulation Symposium (anss-41 2008)
An approach for the effective utilization of GP-GPUs in parallel combined simulation
Proceedings of the 40th Conference on Winter Simulation
GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model
Journal of Computational Physics
GPU-based Real-Time Execution of Vehicular Mobility Models in Large-Scale Road Network Scenarios
PADS '09 Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation
Event-driven gate-level simulation with GP-GPUs
Proceedings of the 46th Annual Design Automation Conference
PacketShader: a GPU-accelerated software router
Proceedings of the ACM SIGCOMM 2010 conference
SCGPSim: a fast SystemC simulator on GPUs
Proceedings of the 2010 Asia and South Pacific Design Automation Conference
An analysis of queuing network simulation using GPU-based hardware acceleration
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Scalable Multi-cache Simulation Using GPUs
MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems
GPU accelerated three-stage execution model for event-parallel simulation
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
An expansion-aided synchronous conservative time management algorithm on GPU
Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
A GPU-based discrete event simulation kernel
Simulation
Hi-index | 0.00 |
Developing complex technical systems requires a systematic exploration of the given design space in order to identify optimal system configurations. However, studying the effects and interactions of even a small number of system parameters often requires an extensive number of simulation runs. This in turn results in excessive runtime demands which severely hamper thorough design space explorations. In this paper, we present a parallel discrete event simulation scheme that enables cost- and time-efficient execution of large scale parameter studies on GPUs. In order to efficiently accommodate the stream-processing paradigm of GPUs, our parallelization scheme exploits two orthogonal levels of parallelism: External parallelism among the inherently independent simulations of a parameter study and internal parallelism among independent events within each individual simulation of a parameter study. Specifically, we design an event aggregation strategy based on external parallelism that generates workloads suitable for GPUs. In addition, we define a pipelined event execution mechanism based on internal parallelism to hide the transfer latencies between host- and GPU-memory. We analyze the performance characteristics of our parallelization scheme by means of a prototype implementation and show a 25-fold performance improvement over purely CPU-based execution.