Deliberative, search-based mitigation strategies for model-based software health management

Authors:
Nagabhushan Mahadevan;Abhishek Dubey;Daniel Balasubramanian;Gabor Karsai
Affiliations:
Department of Electrical Engineering and Computer Science, Institute for Software-Integrated Systems, Vanderbilt University, Nashville, USA 37212;Department of Electrical Engineering and Computer Science, Institute for Software-Integrated Systems, Vanderbilt University, Nashville, USA 37212;Department of Electrical Engineering and Computer Science, Institute for Software-Integrated Systems, Vanderbilt University, Nashville, USA 37212;Department of Electrical Engineering and Computer Science, Institute for Software-Integrated Systems, Vanderbilt University, Nashville, USA 37212
Venue:
Innovations in Systems and Software Engineering
Year:
2013

Citing 25
Cited 0

UPPAAL—a tool suite for automatic verification of real-time systems

Proceedings of the DIMACS/SYCON workshop on Hybrid systems III : verification and control: verification and control
Overview of the CORBA component model

Component-based software engineering
Software fault tolerance techniques and implementation

Software fault tolerance techniques and implementation
Software Fault Tolerance

Software Fault Tolerance
Towards architecture-based self-healing systems

WOSS '02 Proceedings of the first workshop on Self-healing systems
"Self-healing": softening precision to avoid brittleness: position paper for WOSS '02: workshop on self-healing systems

WOSS '02 Proceedings of the first workshop on Self-healing systems
Software Fault Tolerance: A Tutorial

Software Fault Tolerance: A Tutorial
Model-based programming of fault-aware systems

AI Magazine
Specifying adaptation semantics

WADS '05 Proceedings of the 2005 workshop on Architecting dependable systems
Passive mid-stream monitoring of real-time properties

Proceedings of the 5th ACM international conference on Embedded software
RT-MaC: Runtime Monitoring and Checking of Quantitative and Probabilistic Properties

RTCSA '05 Proceedings of the 11th IEEE International Conference on Embedded and Real-Time Computing Systems and Applications
Automatic recovery from software failure

Communications of the ACM - Self managed systems
Model-based development of dynamically adaptive software

Proceedings of the 28th international conference on Software engineering
Software Reliability Engineering: A Roadmap

FOSE '07 2007 Future of Software Engineering
Software Engineering for Self-Adaptive Systems: A Research Roadmap

Software Engineering for Self-Adaptive Systems
Increasing system dependability through architecture-based self-repair

Architecting dependable systems
Towards robust CNF encodings of cardinality constraints

CP'07 Proceedings of the 13th international conference on Principles and practice of constraint programming
Z3: an efficient SMT solver

TACAS'08/ETAPS'08 Proceedings of the Theory and practice of software, 14th international conference on Tools and algorithms for the construction and analysis of systems
Copilot: a hard real-time runtime monitor

RV'10 Proceedings of the First international conference on Runtime verification
Who guards the guardians?: toward v&v of health management software

RV'10 Proceedings of the First international conference on Runtime verification
Application of software health management techniques

Proceedings of the 6th International Symposium on Software Engineering for Adaptive and Self-Managing Systems
Model-based software health management for real-time systems

AERO '11 Proceedings of the 2011 IEEE Aerospace Conference
The Case for Software Health Management

SMC-IT '11 Proceedings of the 2011 IEEE Fourth International Conference on Space Mission Challenges for Information Technology
A component model for hard real-time systems: CCM with ARINC-653

Software—Practice & Experience
Runtime verification of traces under recording uncertainty

RV'11 Proceedings of the Second international conference on Runtime verification

Quantified Score

Hi-index	0.00

Visualization

Abstract

Rising software complexity in aerospace systems makes them very difficult to analyze and prepare for all possible fault scenarios at design time; therefore, classical run-time fault tolerance techniques such as self-checking pairs and triple modular redundancy are used. However, several recent incidents have made it clear that existing software fault tolerance techniques alone are not sufficient. To improve system dependability, simpler, yet formally specified and verified run-time monitoring, diagnosis, and fault mitigation capabilities are needed. Such architectures are already in use for managing the health of vehicles and systems. Software health management is the application of these techniques to software systems. In this paper, we briefly describe the software health management techniques and architecture developed by our research group. The foundation of the architecture is a real-time component framework (built upon ARINC-653 platform services) that defines a model of computation for software components. Dedicated architectural elements: the Component Level Health Manager (CLHM) and System Level Health Manager (SLHM) provide the health management services: anomaly detection, fault source isolation, and fault mitigation. The SLHM includes a diagnosis engine that (1) uses a Timed Failure Propagation Graph (TFPG) model derived from the component assembly model, (2) reasons about cascading fault effects in the system, and (3) isolates the fault source component(s). Thereafter, the appropriate system-level mitigation action is taken. The main focus of this article is the description of the fault mitigation architecture that uses goal-based deliberative reasoning to determine the best mitigation actions for recovering the system from the identified failure mode.