Rectifying orphan components using group-failover in distributed real-time and embedded systems

Authors:
Sumant Tambe;Aniruddha Gokhale
Affiliations:
Vanderbilt University, Nashville, TN, USA;Vanderbilt University, Nashville, TN, Iceland
Venue:
Proceedings of the 14th international ACM Sigsoft symposium on Component based software engineering
Year:
2011

Citing 14
Cited 0

Timestamp-Based Orphan Elimination

IEEE Transactions on Software Engineering
Replica determinism in distributed real-time systems: a brief survey

Real-Time Systems
Unreliable failure detectors for reliable distributed systems

Journal of the ACM (JACM)
The primary-backup approach

Distributed systems (2nd Ed.)
Fail-stop processors: an approach to designing fault-tolerant computing systems

ACM Transactions on Computer Systems (TOCS)
Software-Based Replication for Fault Tolerance

Computer
Reconciling Replication and Transactions for the End-to-End Reliability of CORBA Applications

On the Move to Meaningful Internet Systems, 2002 - DOA/CoopIS/ODBASE 2002 Confederated International Conferences DOA, CoopIS and ODBASE 2002
Deterministic Scheduling for Transactional Multithreaded Replicas

SRDS '00 Proceedings of the 19th IEEE Symposium on Reliable Distributed Systems
ITRA: Inter-Tier Relationship Architecture for End-to-end QoS

The Journal of Supercomputing
Using Program Analysis to Identify and Compensate for Nondeterminism in Fault-Tolerant, Replicated Systems

SRDS '04 Proceedings of the 23rd IEEE International Symposium on Reliable Distributed Systems
Living with nondeterminism in replicated middleware applications

Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
Adaptive Failover for Real-Time Middleware with Passive Replication

RTAS '09 Proceedings of the 2009 15th IEEE Symposium on Real-Time and Embedded Technology and Applications
Preventing orphan requests by integrating replication and transactions

ADBIS'07 Proceedings of the 11th East European conference on Advances in databases and information systems
Supporting component-based failover units in middleware for distributed real-time and embedded systems

Journal of Systems Architecture: the EUROMICRO Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Orphan requests are a significant problem for multi-tier distributed systems since they adversely impact system correctness by violating the exactly-once semantics of applications and may waste resources. Orphan requests stem from the failure(s) of non-deterministic components involved in nested invocations of replicated components. Resolving this problem in the context of resource constrained, component-based, distributed real-time and embedded (DRE) systems that form end-to-end task chains is challenging because conventional transaction-based solutions cannot assure real-time properties of the DRE applications. To address these challenges, this paper presents a group-failover protocol that comprises three key capabilities: real-time failure detection and client failover, timely mitigation of orphan requests, and two novel application state consistency strategies to ensure the correctness of DRE systems by maintaining the exactly-once semantics even during failures. Our solution is implemented in the context of the CIAO real-time CORBA Component Model middleware. Empirical evaluations of the group-failover protocol in both fault-free and failure recovery scenarios for DRE task chains of different sizes demonstratesa low overhead and predictable performance.