Living with nondeterminism in replicated middleware applications

Authors:
Joseph Slember;Priya Narasimhan
Affiliations:
Carnegie Mellon University, Pittsburgh, PA;Carnegie Mellon University, Pittsburgh, PA
Venue:
Middleware'06 Proceedings of the 7th ACM/IFIP/USENIX international conference on Middleware
Year:
2006

Citing 16
Cited 0

Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Replay, recovery, replication, and snapshots of nondeterministic concurrent programs

PODC '91 Proceedings of the tenth annual ACM symposium on Principles of distributed computing
Hypervisor-based fault tolerance

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
Six misconceptions about reliable distributed computing

Proceedings of the 8th ACM SIGOPS European workshop on Support for composing distributed applications
X-ability: a theory of replication

Proceedings of the nineteenth annual ACM symposium on Principles of distributed computing
A Low Latency, Loss Tolerant Architecture and Protocol for Wide Area Group Communication

DSN '00 Proceedings of the 2000 International Conference on Dependable Systems and Networks (formerly FTCS-30 and DCCA-8)
Supporting nondeterministic execution in fault-tolerant systems

FTCS '96 Proceedings of the The Twenty-Sixth Annual International Symposium on Fault-Tolerant Computing (FTCS '96)
TFT: A Software System for Application-Transparent Fault Tolerance

FTCS '98 Proceedings of the The Twenty-Eighth Annual International Symposium on Fault-Tolerant Computing
Enforcing Determinism for the Consistent Replication of Multithreaded CORBA Applications

SRDS '99 Proceedings of the 18th IEEE Symposium on Reliable Distributed Systems
Deterministic Scheduling for Transactional Multithreaded Replicas

SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
Tapping TCP Streams

NCA '01 Proceedings of the IEEE International Symposium on Network Computing and Applications (NCA'01)
Using Program Analysis to Identify and Compensate for Nondeterminism in Fault-Tolerant, Replicated Systems

SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
An integrated experimental environment for distributed systems and networks

OSDI '02 Proceedings of the 5th symposium on Operating systems design and implementationCopyright restrictions prevent ACM from being able to make the PDFs for this conference available for downloading
MEAD: support for Real-Time Fault-Tolerant CORBA: Research Articles

Concurrency and Computation: Practice & Experience - Foundations of Middleware Technologies
Nondeterminism in ORBs: The Perception and the Reality

DEXA '06 Proceedings of the 17th International Conference on Database and Expert Systems Applications
Managing self-inflicted nondeterminism

HotDep'05 Proceedings of the First conference on Hot topics in system dependability

Quantified Score

Hi-index	0.00

Visualization

Abstract

Application-level nondeterminism can lead to inconsistent state that defeats the purpose of replication as a fault-tolerance strategy. We present Midas, a new approach for living with nondeterminism in distributed, replicated, middleware applications. Midas exploits (i) the static program analysis of the application's source code prior to replica deployment and (ii) the online compensation of replica divergence even as replicas execute. We identify the sources of nondeterminism within the application, discriminate between actual and superficial nondeterminism, and track the propagation of actual nondeterminism. We evaluate our techniques for the active replication of servers using micro-benchmarks that contain various sources (multi-threading, system calls and propagation) of nondeterminism.