Federate Fault Tolerance in HLA-Based Simulation

  • Authors:
  • Zengxiang Li; Wentong Cai;Stephen John Turner; Ke Pan

  • Affiliations:
  • Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore;Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore;Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore;Sch. of Comput. Eng., Nanyang Technol. Univ., Singapore, Singapore

  • Venue:
  • PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

A large scale HLA-based simulation (federation) is composed of a large number of simulation components (federates), which may be developed by different participants and executed at different locations. These federates are subject to failures due to various reasons. What is worse, the risk of federation failure increases with the number of federates in the federation. In this paper, a fault tolerance mechanism is proposed to tolerate the crash-stop failures of federates. By exploiting the decoupled federate architecture, federate failures can be masked from the federation and recovery can take place without interrupting the executions of other federates. A basic state recovery protocol is first proposed to recover the state of the failed federate relying on the checkpoint and message logging taken before the failure. Then, an optimized protocol is further developed to accelerate the state recovery procedure. Experiments are carried out to verify that the proposed mechanism provides correct failure recovery. The experimental results also indicate that the optimized protocol can outperform the basic one considerably.