Transparent recovery from intermittent faults in time-triggered distributed systems

Authors:
N. Kandasamy;J. P. Hayes;B. T. Murray
Affiliations:
Electr. Eng. & Comput. Sci. Dept., Michigan Univ., Ann Arbor, MI, USA;-;-
Venue:
IEEE Transactions on Computers
Year:
2003

Citing 0
Cited 14

Design Optimization of Time-and Cost-Constrained Fault-Tolerant Distributed Embedded Systems

Proceedings of the conference on Design, Automation and Test in Europe - Volume 2
Synthesis of fault-tolerant schedules with transparency/performance trade-offs for distributed embedded systems

Proceedings of the conference on Design, automation and test in Europe: Proceedings
Scheduling and voltage scaling for energy/reliability trade-offs in fault-tolerant time-triggered embedded systems

CODES+ISSS '07 Proceedings of the 5th IEEE/ACM international conference on Hardware/software codesign and system synthesis
FLARe: a Fault-tolerant Lightweight Adaptive Real-time middleware for distributed real-time and embedded systems

Proceedings of the 4th on Middleware doctoral symposium
Scheduling of fault-tolerant embedded systems with soft and hard timing constraints

Proceedings of the conference on Design, automation and test in Europe
Synthesis of fault-tolerant embedded systems

Proceedings of the conference on Design, automation and test in Europe
Design optimization of time-and cost-constrained fault-tolerant embedded systems with checkpointing and replication

IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Analysis and optimization of fault-tolerant embedded systems with hardened processors

Proceedings of the Conference on Design, Automation and Test in Europe
Combined architecture and hardening techniques exploration for reliable embedded system design

Proceedings of the 21st edition of the great lakes symposium on Great lakes symposium on VLSI
Cost-effective safety and fault localization using distributed temporal redundancy

CASES '11 Proceedings of the 14th international conference on Compilers, architectures and synthesis for embedded systems
Analysis and optimization of fault-tolerant task scheduling on multiprocessor embedded systems

CODES+ISSS '11 Proceedings of the seventh IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis
A self-checking hardware journal for a fault-tolerant processor architecture

International Journal of Reconfigurable Computing - Special issue on selected papers from the international workshop on reconfigurable communication-centric systems on chips (ReCoSoC' 2010)
Scheduling and Optimization of Fault-Tolerant Embedded Systems with Transparency/Performance Trade-Offs

ACM Transactions on Embedded Computing Systems (TECS)
Using explicit output comparisons for fault tolerant scheduling (FTS) on modern high-performance processors

Proceedings of the Conference on Design, Automation and Test in Europe

Quantified Score

Hi-index	14.98

Visualization

Abstract

The time-triggered model, with tasks scheduled in static (off line) fashion, provides a high degree of timing predictability in safety-critical distributed systems. Such systems must also tolerate transient and intermittent failures which occur far more frequently than permanent ones. Software-based recovery methods using temporal redundancy, such as task reexecution and primary/backup, while incurring performance overhead, are cost-effective methods of handling these failures. We present a constructive approach to integrating runtime recovery policies in a time-triggered distributed system. Furthermore, the method provides transparent failure recovery in that a processor recovering from task failures does not disrupt the operation of other processors. Given a general task graph with precedence and timing constraints and a specific fault model, the proposed method constructs the corresponding fault-tolerant (FT) schedule with sufficient slack to accommodate recovery. We introduce the cluster-based failure recovery concept which determines the best placement of slack within the FT schedule so as to minimize the resulting time overhead. Contingency schedules, also generated offline, revise this FT schedule to mask task failures on individual processors while preserving precedence and timing constraints. We present simulation results which show that, for small-scale embedded systems having task graphs of moderate complexity, the proposed approach generates FT schedules which incur about 30-40 percent performance overhead when compared to corresponding non-fault-tolerant ones.