Understanding fault-tolerant distributed systems
Communications of the ACM
Building secure and reliable network applications
Building secure and reliable network applications
UNIX network programming, volume 2 (2nd ed.): interprocess communications
UNIX network programming, volume 2 (2nd ed.): interprocess communications
Creating computer simulation systems: an introduction to the high level architecture
Creating computer simulation systems: an introduction to the high level architecture
Distributed system fault tolerance using message logging and checkpointing
Distributed system fault tolerance using message logging and checkpointing
Manetho: fault tolerance in distributed systems using rollback-recovery and process replication
Manetho: fault tolerance in distributed systems using rollback-recovery and process replication
Towards a Grid Management System for HLA-Based Interactive Simulations
DS-RT '03 Proceedings of the Seventh IEEE International Symposium on Distributed Simulation and Real-Time Applications
Distributed Systems: Principles and Paradigms (2nd Edition)
Distributed Systems: Principles and Paradigms (2nd Edition)
Parallel and distributed simulation: traditional techniques and recent advances
Proceedings of the 38th conference on Winter simulation
Federate Migration in a Service Oriented HLA RTI
DS-RT '07 Proceedings of the 11th IEEE International Symposium on Distributed Simulation and Real-Time Applications
A decoupled federate architecture for high level architecture-based distributed simulation
Journal of Parallel and Distributed Computing
Large Scale Distributed Virtual Environments on the Grid: Design, Implementation, and a Case Study
Computer Supported Cooperative Work in Design IV
A replication structure for efficient and fault-tolerant parallel and distributed simulations
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
CRITIS'09 Proceedings of the 4th international conference on Critical information infrastructures security
Federate Fault Tolerance in HLA-Based Simulation
PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
Hi-index | 0.00 |
The High Level Architecture (HLA) is a standard for the interoperability and reuse of simulation components, referred to as federates. Large scale HLA-compliant simulations are built to study complex problems, and they often involve a large number of federates and vast computing resources. Simulation federates running at different locations are liable to failure. The failure of one federate can lead to the crash of the overall simulation execution. Such risk increases with the scale of a distributed simulation. Hence, fault-tolerance is required to support runtime robustness. This paper introduces a framework for robust HLAbased distributed simulations using a "Decoupled Federate Architecture. Our framework exploits the architecture to provide a generic fault-tolerant model, that exploits a "dynamic substitution approach to deal with failure. A sender-based method is designed to ensure reliable in-transit message delivery, which is coupled with a novel algorithm to perform effective fossil collection. The fault-tolerant model also avoids any unnecessary repeated computation when handling failure. The framework supports reusability of legacy federate code, and it is platform-neutral and independent of federate modeling approaches. Experiments have been carried out to validate and benchmark the fault-tolerant federates using an example of a simple supply-chain simulation. The experimental results show that the framework provides correct failure recovery and indicate that the overhead for facilitating fault-tolerance is minimal.