FLARe: a Fault-tolerant Lightweight Adaptive Real-time middleware for distributed real-time and embedded systems

Authors:
Jaiganesh Balasubramanian
Affiliations:
Vanderbilt University, Nashville, TN
Venue:
Proceedings of the 4th on Middleware doctoral symposium
Year:
2007

Citing 22
Cited 5

Determining Redundancy Levels for Fault Tolerant Real-Time Systems

IEEE Transactions on Computers - Special issue on fault-tolerant computing
A Real-Time Primary-Backup Replication Service

IEEE Transactions on Parallel and Distributed Systems
ARMADA Middleware and Communication Services

Real-Time Systems
Distributed Fault-Tolerant Real-Time Systems: The Mars Approach

IEEE Micro
Distributed Fault Tolerance: Lessons from Delta-4

IEEE Micro
Load Sharing with Consideration of Future Task Arrivals in Heterogeneous Distributed Real-Time Systems

IEEE Transactions on Computers
The PSTR/SNS Scheme for Real-Time Fault Tolerance via Active Object Replication and Network Surveillance

IEEE Transactions on Knowledge and Data Engineering
AQuA: An Adaptive Architecture that Provides Dependable Distributed Objects

IEEE Transactions on Computers
A Fault-Tolerant Scheduling Algorithm for Real-Time Periodic Tasks with Possible Software Faults

IEEE Transactions on Computers
Enhancing real-time schedules to tolerate transient faults

RTSS '95 Proceedings of the 16th IEEE Real-Time Systems Symposium
Adaptive fault tolerance and graceful degradation under dynamic hard real-time scheduling

RTSS '97 Proceedings of the 18th IEEE Real-Time Systems Symposium
Dynamic resource migration for multiparty real-time communication

ICDCS '96 Proceedings of the 16th International Conference on Distributed Computing Systems (ICDCS '96)
Scalable Resource Allocation for Multi-Processor QoS Optimization

ICDCS '03 Proceedings of the 23rd International Conference on Distributed Computing Systems
A Bi-Criteria Scheduling Heuristic for Distributed Embedded Systems under Reliability and Real-Time Constraints

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
Proactive Recovery in Distributed CORBA Applications

DSN '04 Proceedings of the 2004 International Conference on Dependable Systems and Networks
MEAD: support for Real-Time Fault-Tolerant CORBA: Research Articles

Concurrency and Computation: Practice & Experience - Foundations of Middleware Technologies
Task Partitioning with Replication upon Heterogeneous Multiprocessor Systems

RTAS '06 Proceedings of the 12th IEEE Real-Time and Embedded Technology and Applications Symposium
Real-Time Task Replication for Fault Tolerance in Identical Multiprocessor Systems

RTAS '07 Proceedings of the 13th IEEE Real Time and Embedded Technology and Applications Symposium
MDDPro: Model-Driven Dependability Provisioning in Enterprise Distributed Real-Time and Embedded Systems

ISAS '07 Proceedings of the 4th international symposium on Service Availability
Utility-driven proactive management of availability in enterprise-scale information flows

Proceedings of the ACM/IFIP/USENIX 2006 International Conference on Middleware
Middleware support for dynamic component updating

OTM'05 Proceedings of the 2005 OTM Confederated international conference on On the Move to Meaningful Internet Systems: CoopIS, COA, and ODBASE - Volume Part II
Transparent recovery from intermittent faults in time-triggered distributed systems

IEEE Transactions on Computers

TimeAdapt: timely execution of dynamic software reconfigurations

Proceedings of the 5th Middleware doctoral symposium
Improving performance and reliability of adaptive fault tolerance structure in distributed real time systems

ICCOMP'09 Proceedings of the WSEAES 13th international conference on Computers
Improving performance and reliability of adaptive fault tolerance structure in distributed real time systems

AIC'09 Proceedings of the 9th WSEAS international conference on Applied informatics and communications
Improving performance in adaptive fault tolerance structure with investigating the effect of the number of replication

WSEAS Transactions on Computers
Stheno, a real-time fault-tolerant P2P middleware platform for light-train systems

Proceedings of the 28th Annual ACM Symposium on Applied Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

An important class of distributed real-time and embedded (DRE) applications consists of periodic soft real-time tasks. Timeliness and availability are essential requirements for the correct operation of these applications. Conventional solutions to these challenges tend to use non-adaptive and load-agnostic fault tolerance solutions within a real-time system, which often end up making ad hoc fault tolerance (e.g., failover targets) decisions that can further overload already strained resources. Potential adverse consequences of these ad hoc actions include excessive delays for real-time tasks and cascades of resource failures. This paper presents FLARe, which is a middleware that provides adaptive fault tolerance for DRE systems. FLARe's resource management infrastructure monitors various system metrics, including CPU utilization, and makes informed, load-aware, and adaptive decisions about the application's fault tolerance configurations (e.g., failover targets, physical placement of replicas). FLARe also employs decision making algorithms to adapt these decisions at runtime as faults occur and provides trade-offs between timeliness, availability, and performance as resources get overloaded, removed, or added.