Implementing fault-tolerant services using the state machine approach: a tutorial
ACM Computing Surveys (CSUR)
Replay, recovery, replication, and snapshots of nondeterministic concurrent programs
PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Hypervisor-based fault tolerance
ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Six misconceptions about reliable distributed computing
Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
X-ability: a theory of replication
Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
A Low Latency, Loss Tolerant Architecture and Protocol for Wide Area Group Communication
DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Supporting nondeterministic execution in fault-tolerant systems
FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
TFT: A Software System for Application-Transparent Fault Tolerance
FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Enforcing Determinism for the Consistent Replication of Multithreaded CORBA Applications
SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Deterministic Scheduling for Transactional Multithreaded Replicas
SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
NCA '01 Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA'01)
SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
An integrated experimental environment for distributed systems and networks
OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
MEAD: support for Real-Time Fault-Tolerant CORBA: Research Articles
Concurrency and Computation: Practice & Experience - Foundations of Middleware Technologies
Nondeterminism in ORBs: The Perception and the Reality
DEXA '06 Proceedings of the 17th International Conference on Database and Expert Systems Applications
Managing self-inflicted nondeterminism
HotDep'05 Proceedings of the First conference on Hot topics in system dependability
Hi-index | 0.00 |
Application-level nondeterminism can lead to inconsistent state that defeats the purpose of replication as a fault-tolerance strategy. We present Midas, a new approach for living with nondeterminism in distributed, replicated, middleware applications. Midas exploits (i) the static program analysis of the application's source code prior to replica deployment and (ii) the online compensation of replica divergence even as replicas execute. We identify the sources of nondeterminism within the application, discriminate between actual and superficial nondeterminism, and track the propagation of actual nondeterminism. We evaluate our techniques for the active replication of servers using micro-benchmarks that contain various sources (multi-threading, system calls and propagation) of nondeterminism.