Checkpointing and Rollback-Recovery for Distributed Systems
IEEE Transactions on Software Engineering - Special issue on distributed systems
Probabilistic reasoning in intelligent systems: networks of plausible inference
Probabilistic reasoning in intelligent systems: networks of plausible inference
A valuation-based language for expert systems
International Journal of Approximate Reasoning
Trace Analysis for Conformance and Arbitration Testing
IEEE Transactions on Software Engineering
Incremental testing of object-oriented class structures
ICSE '92 Proceedings of the 14th international conference on Software engineering
Specification-based test oracles for reactive systems
ICSE '92 Proceedings of the 14th international conference on Software engineering
Illustrative risks to the public in the use of computer systems and related technology
ACM SIGSOFT Software Engineering Notes
Design for testability in object-oriented systems
Communications of the ACM
Testing object-oriented software
TOOLS '93 Proceedings of the eleventh international conference on Technology of object-oriented languages and systems
The ASTOOT approach to testing object-oriented programs
ACM Transactions on Software Engineering and Methodology (TOSEM)
On testable object-oriented programming
ACM SIGSOFT Software Engineering Notes
Using Test Oracles Generated from Program Documentation
IEEE Transactions on Software Engineering
Principles of a computer immune system
NSPW '97 Proceedings of the 1997 workshop on New security paradigms
Fundamentals of fault-tolerant distributed computing in asynchronous environments
ACM Computing Surveys (CSUR)
On built-in test reuse in object-oriented framework design
ACM Computing Surveys (CSUR)
Fault-Tolerant Software for Real-Time Applications
ACM Computing Surveys (CSUR)
Artificial Intelligence - special issue on computational tradeoffs under bounded resources
Software Fault Tolerance
A survey of rollback-recovery protocols in message-passing systems
ACM Computing Surveys (CSUR)
The Challenges of Real-Time AI
Computer
On-Line Monitoring: A Tutorial
Computer
A differential approach to inference in Bayesian networks
Journal of the ACM (JACM)
A Method for Built-in Tests in Component-based Software Maintenance
CSMR '99 Proceedings of the Third European Conference on Software Maintenance and Reengineering
System structure for software fault tolerance
Proceedings of the international conference on Reliable software
Software Rejuvenation: Analysis, Module and Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Efficient monitoring of safety properties
International Journal on Software Tools for Technology Transfer (STTT) - Special section on tools and algorithms for the construction and analysis of systems
Basic Concepts and Taxonomy of Dependable and Secure Computing
IEEE Transactions on Dependable and Secure Computing
A biological programming model for self-healing
Proceedings of the 2003 ACM workshop on Survivable and self-regenerative systems: in association with 10th ACM Conference on Computer and Communications Security
A Taxonomy and Catalog of Runtime Software-Fault Monitoring Tools
IEEE Transactions on Software Engineering
Recovering Behavioral Design Models from Execution Traces
CSMR '05 Proceedings of the Ninth European Conference on Software Maintenance and Reengineering
Vigilante: end-to-end containment of internet worms
Proceedings of the twentieth ACM symposium on Operating systems principles
Self-healing systems - survey and synthesis
Decision Support Systems
Application areas of AIS: The past, the present and the future
Applied Soft Computing
IEEE Transactions on Computers
The N-Version Approach to Fault-Tolerant Software
IEEE Transactions on Software Engineering
Exploiting traces in static program analysis: better model checking through $${{\tt printf}}$$s
International Journal on Software Tools for Technology Transfer (STTT)
Runtime Verification
Modeling and Reasoning with Bayesian Networks
Modeling and Reasoning with Bayesian Networks
Complexity results and approximation strategies for MAP explanations
Journal of Artificial Intelligence Research
Exploiting causal independence in Bayesian network inference
Journal of Artificial Intelligence Research
Compiling Bayesian networks using variable elimination
IJCAI'07 Proceedings of the 20th international joint conference on Artifical intelligence
A Real-Time Component Framework: Experience with CCM and ARINC-653
ISORC '10 Proceedings of the 2010 13th IEEE International Symposium on Object/Component/Service-Oriented Real-Time Distributed Computing
Symbolic PathFinder: symbolic execution of Java bytecode
Proceedings of the IEEE/ACM international conference on Automated software engineering
Probabilistic model-based diagnosis: an electrical power system case study
IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans - Special issue on model-based diagnostics
Towards software health management with bayesian networks
Proceedings of the FSE/SDP workshop on Future of software engineering research
IEEE Transactions on Knowledge and Data Engineering
Journal of Automated Reasoning
Proceedings of the First international conference on Runtime verification
RV'10 Proceedings of the First international conference on Runtime verification
Proceedings of the First international conference on Runtime verification
RV'10 Proceedings of the First international conference on Runtime verification
On the requirements for successful GPS spoofing attacks
Proceedings of the 18th ACM conference on Computer and communications security
The Case for Software Health Management
SMC-IT '11 Proceedings of the 2011 IEEE Fourth International Conference on Space Mission Challenges for Information Technology
Integrated Software and Sensor Health Management for Small Spacecraft
SMC-IT '11 Proceedings of the 2011 IEEE Fourth International Conference on Space Mission Challenges for Information Technology
Hi-index | 0.00 |
Software health management (SWHM) is an emerging field which addresses the critical need to detect, diagnose, predict, and mitigate adverse events due to software faults and failures. These faults could arise for numerous reasons including coding errors, unanticipated faults or failures in hardware, or problematic interactions with the external environment. This paper demonstrates a novel approach to software health management based on a rigorous Bayesian formulation that monitors the behavior of software and operating system, performs probabilistic diagnosis, and provides information about the most likely root causes of a failure or software problem. Translation of the Bayesian network model into an efficient data structure, an arithmetic circuit, makes it possible to perform SWHM on resource-restricted embedded computing platforms as found in aircraft, unmanned aircraft, or satellites. SWHM is especially important for safety critical systems such as aircraft control systems. In this paper, we demonstrate our Bayesian SWHM system on three realistic scenarios from an aircraft control system: (1) aircraft file-system based faults, (2) signal handling faults, and (3) navigation faults due to inertial measurement unit (IMU) failure or compromised Global Positioning System (GPS) integrity. We show that the method successfully detects and diagnoses faults in these scenarios. We also discuss the importance of verification and validation of SWHM systems.