The local Time Warp approach to parallel simulation
PADS '93 Proceedings of the seventh workshop on Parallel and distributed simulation
Noncommittal barrier synchronization
Parallel Computing
Time management in the DoD high level architecture
PADS '96 Proceedings of the tenth workshop on Parallel and distributed simulation
Scheduling critical channels in conservative parallel discrete event simulation
PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
Computing in Science and Engineering
A Case for NOW (Networks of Workstations)
IEEE Micro
Composite Synchronization in Parallel Discrete-Event Simulation
IEEE Transactions on Parallel and Distributed Systems
Performance of a mixed shared/distributed memory parallel network simulator
Proceedings of the eighteenth workshop on Parallel and distributed simulation
Realistic Large-Scale Online Network Simulation
Proceedings of the 2004 ACM/IEEE conference on Supercomputing
Traffic-based Load Balance for Scalable Network Emulation
Proceedings of the 2003 ACM/IEEE conference on Supercomputing
A large-scale real-time network simulation study using prime
Winter Simulation Conference
Hierarchical Composite Synchronization
PADS '12 Proceedings of the 2012 ACM/IEEE/SCS 26th Workshop on Principles of Advanced and Distributed Simulation
Model-driven network emulation with virtual time machine
Proceedings of the Winter Simulation Conference
Hi-index | 0.00 |
Strong reasons exist for executing a large-scale discrete-event simulation on a cluster of processor nodes (each of which may be a shared-memory multiprocessor or a uniprocessor). This is the architecture of the largest scale parallel machines, and so the largest simulation problems can only be solved this way. It is a common architecture even in less esoteric settings, and is suitable for memory-bound simulations. This paper describes our approach to porting the SSF simulation kernel to this architecture, using the Message Passing Interface (MPI) system. The notable feature of this transformation is to support an efficient two-level synchronization and communication scheme that addresses cost discrepancies between shared-memory and distributed memory. In the initial implementation, we use a globally synchronous approach between distributed-memory nodes, and an asynchronous shared-memory approach within a SMP cluster. The SSF API reflects inherently shared-memory assumptions; we report therefore on our approach for porting an SSF kernel to a cluster of SMP nodes. Experimental results on two architectures are described, for a model of TCP/IP traffic flows over a hierarchical network. The performance on a distributed network of commodity SMPs connected through ethernet is seen to frequently exceed performance on a Sun shared-memory multiprocessor.