Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Hypervisor-based fault tolerance
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
TFT: A Software System for Application-Transparent Fault Tolerance
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Enforcing Determinism for the Consistent Replication of Multithreaded CORBA Applications
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
MIDDLE-R: Consistent database replication at the middleware level
ACM Transactions on Computer Systems (TOCS)
Living with nondeterminism in replicated middleware applications
Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
Centrifuge: integrated lease management and partitioning for cloud services
NSDI'10 Proceedings of the 7th USENIX conference on Networked systems design and implementation
Deterministic process groups in dOS
OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation
Hi-index | 0.00 |
Midas is an inter-disciplinary approach to supporting state-machine replication for nondeterministic distributed applications. The approach exploits compile-time static analysis to identify both firsthand and second-hand sources of nondeterminism. Subsequent runtime compensation occurs through either the transfer of nondeterministic checkpoints or the reexecution of inserted code, and restores consistency among replicas before each new client request. The approach avoids the need for lock-step synchronization and leverages application-level insight to address only the nondeterminism that matters. Our preliminary evaluation demonstrates Midas' feasibility and current performance overheads.