Overview of the CORBA component model
Component-based software engineering
Software fault tolerance techniques and implementation
Software fault tolerance techniques and implementation
Software Fault Tolerance
Towards architecture-based self-healing systems
WOSS '02 Proceedings of the first workshop on Self-healing systems
WOSS '02 Proceedings of the first workshop on Self-healing systems
A Universal Smart Transducer Interface: TTP/A
ISORC '00 Proceedings of the Third IEEE International Symposium on Object-Oriented Real-Time Distributed Computing
Software Fault Tolerance: A Tutorial
Software Fault Tolerance: A Tutorial
Model-based programming of fault-aware systems
AI Magazine
Analysing failure behaviours in component interaction
Journal of Systems and Software
Basic Concepts and Taxonomy of Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing
Specifying adaptation semantics
WADS '05 Proceedings of the 2005 workshop on Architecting dependable systems
Automatic recovery from software failure
Communications of the ACM - Self managed systems
Model-based development of dynamically adaptive software
Proceedings of the 28th international conference on Software engineering
Software Reliability Engineering: A Roadmap
FOSE '07 2007 Future of Software Engineering
Software Engineering for Self-Adaptive Systems: A Research Roadmap
Software Engineering for Self-Adaptive Systems
Modular Architectural Representation and Analysis of Fault Propagation and Transformation
Electronic Notes in Theoretical Computer Science (ENTCS)
Increasing system dependability through architecture-based self-repair
Architecting dependable systems
Model-based software health management for real-time systems
AERO '11 Proceedings of the 2011 IEEE Aerospace Conference
Deliberative, search-based mitigation strategies for model-based software health management
Innovations in Systems and Software Engineering
Towards a resilient deployment and configuration infrastructure for fractionated spacecraft
ACM SIGBED Review - Special Issue on the 5th Workshop on Adaptive and Reconfigurable Embedded Systems
Hi-index | 0.00 |
The growing complexity of software used in large-scale, safety critical cyber-physical systems makes it increasingly difficult to expose and hence correct all potential defects. There is a need to augment the existing fault tolerance methodologies with new approaches that address latent software defects exposed at runtime. This paper describes an approach that borrows and adapts traditional 'System Health Management' techniques to improve software dependability through simple formal specification of runtime monitoring, diagnosis, and mitigation strategies. The two-level approach to health management at the component and system level is demonstrated on a simulated case study of an Air Data Inertial Reference Unit (ADIRU). An ADIRU was categorized as the primary failure source for the in-flight upset caused in the Malaysian Air flight 124 over Perth, Australia in 2005.