Artificial Intelligence
Information Processing Letters
Visualization of test information to assist fault localization
Proceedings of the 24th International Conference on Software Engineering
Architectural style requirements for self-healing systems
WOSS '02 Proceedings of the first workshop on Self-healing systems
The Vision of Autonomic Computing
Computer
UML Distilled: A Brief Guide to the Standard Object Modeling Language
UML Distilled: A Brief Guide to the Standard Object Modeling Language
Software Rejuvenation: Analysis, Module and Applications
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Scalable statistical bug isolation
Proceedings of the 2005 ACM SIGPLAN conference on Programming language design and implementation
Architecture-based self-adaptation in the presence of multiple objectives
Proceedings of the 2006 international workshop on Self-adaptation and self-managing systems
Run-time monitoring of architecturally significant behaviors using behavioral profiles and aspects
Proceedings of the 2006 international symposium on Software testing and analysis
Self-healing systems - survey and synthesis
Decision Support Systems
Discovering Architectures from Running Systems
IEEE Transactions on Software Engineering
Statistical Debugging: A Hypothesis Testing-Based Approach
IEEE Transactions on Software Engineering
Microreboot — A technique for cheap recovery
OSDI'04 Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6
On the Accuracy of Spectrum-based Fault Localization
TAICPART-MUTATION '07 Proceedings of the Testing: Academic and Industrial Conference Practice and Research Techniques - MUTATION
An observation-based model for fault localization
WODA '08 Proceedings of the 2008 international workshop on dynamic analysis: held in conjunction with the ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA 2008)
A rigorous architectural approach to adaptive software engineering
Journal of Computer Science and Technology
Evaluating Models for Model-Based Debugging
ASE '08 Proceedings of the 2008 23rd IEEE/ACM International Conference on Automated Software Engineering
Diagnosing multiple persistent and intermittent faults
IJCAI'09 Proceedings of the 21st international jont conference on Artifical intelligence
Spectrum-Based Multiple Fault Localization
ASE '09 Proceedings of the 2009 IEEE/ACM International Conference on Automated Software Engineering
Diagnosing multiple intermittent failures using maximum likelihood estimation
Artificial Intelligence
The reliability estimation, prediction and measuring of component-based software
Journal of Systems and Software
Stitch: A language for architecture-based self-adaptation
Journal of Systems and Software
Architecture-based self-protecting software systems
Proceedings of the 9th international ACM Sigsoft conference on Quality of software architectures
Diagnosing architectural run-time failures
Proceedings of the 8th International Symposium on Software Engineering for Adaptive and Self-Managing Systems
Hi-index | 0.00 |
An important step in achieving robustness to run-time faults is the ability to detect and repair problems when they arise in a running system. Effective fault detection and repair could be greatly enhanced by run-time fault diagnosis and localization, since it would allow the repair mechanisms to focus adaptation effort on the parts most in need of attention. In this paper we describe an approach to run-time fault diagnosis that combines architectural models with spectrum-based reasoning for multiple fault localization. Spectrum-based reasoning is a lightweight technique that takes a form of trace abstraction and produces a list (ordered by probability) of likely fault candidates. We show how this technique can be combined with architectural models to support run-time diagnosis that can (a) scale to modern distributed software systems; (b) accommodate the use of black-box components and proprietary infrastructure for which one has neither a specification nor source code; and (c) handle inherent uncertainty about the probable cause of a problem even in the face of transient faults and faults that arise only when certain combinations of system components interact.