Understanding fault-tolerant distributed systems
Communications of the ACM
Building secure and reliable network applications
Building secure and reliable network applications
UNIX network programming, volume 2 (2nd ed.): interprocess communications
UNIX network programming, volume 2 (2nd ed.): interprocess communications
Creating computer simulation systems: an introduction to the high level architecture
Creating computer simulation systems: an introduction to the high level architecture
Distributed system fault tolerance using message logging and checkpointing
Distributed system fault tolerance using message logging and checkpointing
Manetho: fault tolerance in distributed systems using rollback-recovery and process replication
Manetho: fault tolerance in distributed systems using rollback-recovery and process replication
Towards a Grid Management System for HLA-Based Interactive Simulations
DS-RT '03 Proceedings of the Seventh IEEE International Symposium on Distributed Simulation and Real-Time Applications
A Framework for Executing Parallel Simulation Using RTI
DS-RT '03 Proceedings of the Seventh IEEE International Symposium on Distributed Simulation and Real-Time Applications
Federate migration in HLA-based simulation
Future Generation Computer Systems
Algorithms for HLA-based distributed simulation cloning
ACM Transactions on Modeling and Computer Simulation (TOMACS)
Evaluation of a Fault-Tolerance Mechanism for HLA-Based Distributed Simulations
Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation
A Framework for Robust HLA-based Distributed Simulations
Proceedings of the 20th Workshop on Principles of Advanced and Distributed Simulation
A framework for fault-tolerance in HLA-based distributed simulations
WSC '05 Proceedings of the 37th conference on Winter simulation
Distributed Systems: Principles and Paradigms (2nd Edition)
Distributed Systems: Principles and Paradigms (2nd Edition)
Synchronization in federation community networks
Journal of Parallel and Distributed Computing
Hybrid modelling and simulation of huge crowd over a hierarchical Grid architecture
Future Generation Computer Systems
Hi-index | 0.00 |
Large scale High Level Architecture (HLA)-based simulations are built to study complex problems, and they often involve a large number of federates and vast computing resources. Simulation federates running at different locations are subject to failure. The failure of one federate can lead to the crash of the overall simulation execution. Such risk increases with the scale of a distributed simulation. Hence, fault tolerance is required to support runtime robustness. This paper introduces a framework for robust HLA-based distributed simulations using a 'Decoupled Federate Architecture'. The framework provides a generic fault-tolerant model, which deals with failure with a dynamic substitution approach. A sender-based method is designed to ensure reliable in-transit message delivery, which is coupled with a novel algorithm to perform effective fossil collection. The fault-tolerant model also avoids any unnecessary repeated computation when handling failure. Using a middleware approach, the framework supports reusability of legacy federate code and it is platform-neutral and independent of federate modeling approaches. Experiments have been carried out to validate and benchmark the fault-tolerant federates using an example of a supply-chain simulation. The experimental results show that the framework provides correct failure recovery. The results also indicate that the framework only incurs minimal overhead for facilitating fault tolerance and has a promising scalability.