Towards Fault-tolerant HLA-based Distributed Simulations

  • Authors:
  • Dan Chen;Stephen J. Turner; Wentong Cai

  • Affiliations:
  • Institute of Electrical Engineering Yanshan UniversityQinhuangdao 066004 China;School of Computer Engineering Nanyang TechnologicalUniversity 639798 Singapore;School of Computer Engineering Nanyang TechnologicalUniversity 639798 Singapore

  • Venue:
  • Simulation
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

Large scale High Level Architecture (HLA)-based simulations are built to study complex problems, and they often involve a large number of federates and vast computing resources. Simulation federates running at different locations are subject to failure. The failure of one federate can lead to the crash of the overall simulation execution. Such risk increases with the scale of a distributed simulation. Hence, fault tolerance is required to support runtime robustness. This paper introduces a framework for robust HLA-based distributed simulations using a 'Decoupled Federate Architecture'. The framework provides a generic fault-tolerant model, which deals with failure with a dynamic substitution approach. A sender-based method is designed to ensure reliable in-transit message delivery, which is coupled with a novel algorithm to perform effective fossil collection. The fault-tolerant model also avoids any unnecessary repeated computation when handling failure. Using a middleware approach, the framework supports reusability of legacy federate code and it is platform-neutral and independent of federate modeling approaches. Experiments have been carried out to validate and benchmark the fault-tolerant federates using an example of a supply-chain simulation. The experimental results show that the framework provides correct failure recovery. The results also indicate that the framework only incurs minimal overhead for facilitating fault tolerance and has a promising scalability.