Optimistic recovery in distributed systems
ACM Transactions on Computer Systems (TOCS)
ACM Transactions on Programming Languages and Systems (TOPLAS)
Parallel discrete event simulation
Communications of the ACM - Special issue on simulation
Topics in distributed algorithms
Topics in distributed algorithms
Efficient algorithms for distributed snapshots and global virtual time approximation
Journal of Parallel and Distributed Computing - Special issue on parallel and discrete event simulation
Adaptive checkpointing in Time Warp
PADS '94 Proceedings of the eighth workshop on Parallel and distributed simulation
Comparative analysis of periodic state saving techniques in time warp simulators
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
A case study in simulating PCS networks using Time Warp
PADS '95 Proceedings of the ninth workshop on Parallel and distributed simulation
Distributed snapshots: determining global states of distributed systems
ACM Transactions on Computer Systems (TOCS)
On Coordinated Checkpointing in Distributed Systems
IEEE Transactions on Parallel and Distributed Systems
Exploiting model independence for parallel PCS network simulation
PADS '99 Proceedings of the thirteenth workshop on Parallel and distributed simulation
A Cost Model for Selecting Checkpoint Positions in Time Warp Parallel Simulation
IEEE Transactions on Parallel and Distributed Systems
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
Observing Global States of Asynchronous Distributed Applications
Proceedings of the 3rd International Workshop on Distributed Algorithms
An Analysis of Communication-Induced Checkpointing
FTCS '99 Proceedings of the Twenty-Ninth Annual International Symposium on Fault-Tolerant Computing
Distributed simulation, algorithms and performance analysis (load balancing, distributed processing)
Distributed simulation, algorithms and performance analysis (load balancing, distributed processing)
Using Consistent Global Checkpoints to Synchronize Processes in Distributed Simulation
DS-RT '05 Proceedings of the 9th IEEE International Symposium on Distributed Simulation and Real-Time Applications
Detecting Arbitrary Stable Properties Using Efficient Snapshots
IEEE Transactions on Software Engineering
IEEE Transactions on Wireless Communications
Proceedings of the 22nd Workshop on Principles of Advanced and Distributed Simulation
Controlling Bias in Optimistic Simulations with Space Uncertain Events
DS-RT '08 Proceedings of the 2008 12th IEEE/ACM International Symposium on Distributed Simulation and Real-Time Applications
PADS '09 Proceedings of the 2009 ACM/IEEE/SCS 23rd Workshop on Principles of Advanced and Distributed Simulation
A replication structure for efficient and fault-tolerant parallel and distributed simulations
SpringSim '10 Proceedings of the 2010 Spring Simulation Multiconference
Federate Fault Tolerance in HLA-Based Simulation
PADS '10 Proceedings of the 2010 IEEE Workshop on Principles of Advanced and Distributed Simulation
The ROme OpTimistic Simulator: core internals and programming model
Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques
An evolutionary algorithm to optimize log/restore operations within optimistic simulation platforms
Proceedings of the 4th International ICST Conference on Simulation Tools and Techniques
Assessing load-sharing within optimistic simulation platforms
Proceedings of the Winter Simulation Conference
Hi-index | 0.00 |
In this paper we study how to reuse checkpoints taken in an uncorrelated manner during the forward execution phase in an optimistic simulation system in order to construct global consistent snapshots which are also committed (i.e. the logical time they refer to is lower than the current GVT value). This is done by introducing a heuristic-based mechanism relying on update operations applied to local committed checkpoints of the involved logical processes so to eliminate mutual dependencies among the final achieved state values. The mechanism is lightweight since it does not require any form of (distributed) coordination to determine which are the checkpoint update operations to be performed. At the same time it is likely to reduce the amount of checkpoint update operations required to realign the consistent global state exactly to the current GVT value, taken as the reference time for the snapshot. Our proposal can support, in a performance effective manner, termination detection schemes based on global predicates evaluated on a committed and consistent global snapshot, which represent an alternative as relevant as classical termination check only relying on the current GVT value. Another application concerns interactive simulation environments, where (aggregate) output information about committed and consistent snapshots needs to be frequently provided, hence requiring lightweight mechanisms for the construction of the snapshots.