Multi-level Parallelism for Time- and Cost-Efficient Parallel Discrete Event Simulation on GPUs

Authors:
Georg Kunz;Daniel Schemmel;James Gross;Klaus Wehrle
Affiliations:
-;-;-;-
Venue:
PADS '12 Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation
Year:
2012

Citing 21
Cited 3

Parallel discrete event simulation

Communications of the ACM - Special issue on simulation
Cloning: a novel method for interactive parallel simulation

Proceedings of the 29th conference on Winter simulation
Exploiting temporal uncertainty in parallel and distributed simulations

PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
Cloning parallel simulations

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Theory of Modeling and Simulation

Theory of Modeling and Simulation
Discrete-event Execution Alternatives on General Purpose Graphical Processing Units (GPGPUs)

Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation
ns-3 project goals

WNS2 '06 Proceeding from the 2006 workshop on ns-2: the IP network simulator
Parallel and distributed simulation: traditional techniques and recent advances

Proceedings of the 38th conference on Winter simulation
GPU accelerated molecular dynamics simulation of thermal conductivities

Journal of Computational Physics
GPU-Accelerated Evaluation Platform for High Fidelity Network Modeling

Proceedings of the 21st International Workshop on Principles of Advanced and Distributed Simulation
Interval Branching

Proceedings of the 22nd Workshop on Principles of Advanced and Distributed Simulation
Logical Process Based Sequential Simulation Cloning

ANSS-41 '08 Proceedings of the 41st Annual Simulation Symposium (anss-41 2008)
An approach for the effective utilization of GP-GPUs in parallel combined simulation

Proceedings of the 40th Conference on Winter Simulation
GPU accelerated Monte Carlo simulation of the 2D and 3D Ising model

Journal of Computational Physics
GPU-based Real-Time Execution of Vehicular Mobility Models in Large-Scale Road Network Scenarios

PADS '09 Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation
Event-driven gate-level simulation with GP-GPUs

Proceedings of the 46th Annual Design Automation Conference
PacketShader: a GPU-accelerated software router

Proceedings of the ACM SIGCOMM 2010 conference
A GPU-Based Application Framework Supporting Fast Discrete-Event Simulation

Simulation
SCGPSim: a fast SystemC simulator on GPUs

Proceedings of the 2010 Asia and South Pacific Design Automation Conference
An analysis of queuing network simulation using GPU-based hardware acceleration

ACM Transactions on Modeling and Computer Simulation (TOMACS)
Scalable Multi-cache Simulation Using GPUs

MASCOTS '11 Proceedings of the 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems

GPU accelerated three-stage execution model for event-parallel simulation

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
An expansion-aided synchronous conservative time management algorithm on GPU

Proceedings of the 2013 ACM SIGSIM conference on Principles of advanced discrete simulation
A GPU-based discrete event simulation kernel

Simulation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Developing complex technical systems requires a systematic exploration of the given design space in order to identify optimal system configurations. However, studying the effects and interactions of even a small number of system parameters often requires an extensive number of simulation runs. This in turn results in excessive runtime demands which severely hamper thorough design space explorations. In this paper, we present a parallel discrete event simulation scheme that enables cost- and time-efficient execution of large scale parameter studies on GPUs. In order to efficiently accommodate the stream-processing paradigm of GPUs, our parallelization scheme exploits two orthogonal levels of parallelism: External parallelism among the inherently independent simulations of a parameter study and internal parallelism among independent events within each individual simulation of a parameter study. Specifically, we design an event aggregation strategy based on external parallelism that generates workloads suitable for GPUs. In addition, we define a pipelined event execution mechanism based on internal parallelism to hide the transfer latencies between host- and GPU-memory. We analyze the performance characteristics of our parallelization scheme by means of a prototype implementation and show a 25-fold performance improvement over purely CPU-based execution.