A replication structure for efficient and fault-tolerant parallel and distributed simulations

  • Authors:
  • Zengxiang Li;Wentong Cai;Stephen John Turner;Ke Pan

  • Affiliations:
  • Nanyang Technological University, Singapore;Nanyang Technological University, Singapore;Nanyang Technological University, Singapore;Nanyang Technological University, Singapore

  • Venue:
  • SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large scale parallel and distributed simulations (federations) are developed to study complex systems. Their executions are usually computationally intensive, involving a large number of simulation components (federates) which may be developed by different participants and executed at different locations. Hence, it is attractive to provide mechanisms which can accelerate the executions and tolerate the failures of federates. Previously, we have proposed a federate replication structure, which improves simulation performance by replicating federates with alternative synchronization approaches and automatically choosing the fastest replica to represent the federate in the federation execution. In this paper, we will extend the replication structure so that it keeps the advantages of performance enhancement in the presence of failures. Besides presenting the design and implementation details, we also report the experimental results to demonstrate that the extended replication structure can provide fault tolerance while maintaining performance advantages for simulation executions.