Active Replication of Multithreaded Applications

Authors:
Claudio Basile;Zbigniew Kalbarczyk;Ravishankar K. Iyer
Affiliations:
IEEE;IEEE;IEEE
Venue:
IEEE Transactions on Parallel and Distributed Systems
Year:
2006

Citing 14
Cited 2

Fault tolerance under UNIX

ACM Transactions on Computer Systems (TOCS)
Implementing fault-tolerant services using the state machine approach: a tutorial

ACM Computing Surveys (CSUR)
Hypervisor-based fault tolerance

ACM Transactions on Computer Systems (TOCS) - Special issue on operating system principles
System support for object groups

Proceedings of the 13th ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications
Time, clocks, and the ordering of events in a distributed system

Communications of the ACM
Group communication specifications: a comprehensive study

ACM Computing Surveys (CSUR)
The Rampart Toolkit for Building High-Integrity Services

Selected Papers from the International Workshop on Theory and Practice in Distributed Systems
FATOMAS-A Fault-Tolerant Mobile Agent System Based on the Agent-Dependent Approach

DSN '01 Proceedings of the 2001 International Conference on Dependable Systems and Networks (formerly: FTCS)
AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects

SRDS '98 Proceedings of the The 17th IEEE Symposium on Reliable Distributed Systems
Deterministic Scheduling for Transactional Multithreaded Replicas

SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
Non-Intrusive, Parallel Recovery of Replicated Data

SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
Loose Synchronization of Multithreaded Replicas

SRDS '02 Proceedings of the 21st IEEE Symposium on Reliable Distributed Systems
A Framework for Assessing Dependability in Distributed Systems with Lightweight Fault Injectors

IPDS '00 Proceedings of the 4th International Computer Performance and Dependability Symposium
The ensemble system

The ensemble system

The case for determinism in database systems

Proceedings of the VLDB Endowment
Deterministic process groups in dOS

OSDI'10 Proceedings of the 9th USENIX conference on Operating systems design and implementation

Quantified Score

Hi-index	0.00

Visualization

Abstract

Software-based active replication is expensive in terms of performance overhead. Multithreading can help improve performance; however, thread scheduling is a source of nondeterminism in replica behavior. To achieve strong replica consistency in multithreaded environments, this paper proposes intercepting mutex lock/unlock operations performed by threads on accessing the shared data and contributes with two algorithmic solutions: 1) a Loose Synchronization Algorithm (LSA), which captures the natural concurrency in a leader replica and projects it on follower replicas through interreplica communication, and 2) a Preemptive Deterministic Scheduler (PDS) algorithm, which removes the need for interreplica communication through the notion of round and by suspending threads when it is unable (yet) to schedule them deterministically. Failure behavior and performance of LSA and PDS implementations are evaluated in a triplicated system and compared with existing solutions. A performance evaluation indicates that LSA and PDS outperform existing solutions, with PDS offering lower throughput than LSA. A fault-injection campaign shows that PDS is more robust to errors due to the absence of interreplica communication. Hence, LSA and PDS represent a trade-off between performance and dependability. Finally, LSA and PDS are demonstrated in replicating the Apache Web server, a substantial real-world application.