Orthogonal Defect Classification-A Concept for In-Process Measurements
IEEE Transactions on Software Engineering - Special issue on software measurement principles, techniques, and environments
Two techniques for transient software error recovery
Papers of the workshop on Hardware and software architectures for fault tolerance : experiences and perspectives: experiences and perspectives
Software reliability and dependability: a roadmap
Proceedings of the Conference on The Future of Software Engineering
Machine learning in automated text categorization
ACM Computing Surveys (CSUR)
IEEE Transactions on Computers
Pinpoint: Problem Determination in Large, Dynamic Internet Services
DSN '02 Proceedings of the 2002 International Conference on Dependable Systems and Networks
Automated support for classifying software failure reports
Proceedings of the 25th International Conference on Software Engineering
Networked Windows NT System Field Failure Data Analysis
PRDC '99 Proceedings of the 1999 Pacific Rim International Symposium on Dependable Computing
Measurement of Failure Rate in Widely Distributed Software
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
One-class svms for document classification
The Journal of Machine Learning Research
Performance debugging for distributed systems of black boxes
SOSP '03 Proceedings of the nineteenth ACM symposium on Operating systems principles
Finding Latent Code Errors via Machine Learning over Program Executions
Proceedings of the 26th International Conference on Software Engineering
Effective Fault Treatment for Improving the Dependability of COTS and Legacy-Based Applications
IEEE Transactions on Dependable and Secure Computing
Failure Diagnosis Using Decision Trees
ICAC '04 Proceedings of the First International Conference on Autonomic Computing
Data Mining Approaches to Software Fault Diagnosis
RIDE '05 Proceedings of the 15th International Workshop on Research Issues in Data Engineering: Stream Data Mining and Applications
Automatic Model-Driven Recovery in Distributed Systems
SRDS '05 Proceedings of the 24th IEEE Symposium on Reliable Distributed Systems
A Framework for the Evaluation of Intrusion Detection Systems
SP '06 Proceedings of the 2006 IEEE Symposium on Security and Privacy
Automated known problem diagnosis with event traces
Proceedings of the 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006
Emulation of Software Faults: A Field Data Study and a Practical Approach
IEEE Transactions on Software Engineering
Experimental Risk Assessment and Comparison Using Software Fault Injection
DSN '07 Proceedings of the 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks
Triage: diagnosing production run failures at the user's site
Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles
Distributed Diagnosis of Failures in a Three Tier E-Commerce System
SRDS '07 Proceedings of the 26th IEEE International Symposium on Reliable Distributed Systems
Online Diagnosis and Recovery: On the Choice and Impact of Tuning Parameters
IEEE Transactions on Dependable and Secure Computing
Classifying Software Changes: Clean or Buggy?
IEEE Transactions on Software Engineering
A sense of self for Unix processes
SP'96 Proceedings of the 1996 IEEE conference on Security and privacy
Hi-index | 0.00 |
This paper proposes an approach to software faults diagnosis in complex fault tolerant systems, encompassing the phases of error detection, fault location, and system recovery. Errors are detected in the first phase, exploiting the operating system support. Faults are identified during the location phase, through a machine learning based approach. Then, the best recovery action is triggered once the fault is located. Feedback actions are also used during the location phase to improve detection quality over time. A real world application from the Air Traffic Control field has been used as case study for evaluating the proposed approach. Experimental results, achieved by means of fault injection, show that the diagnosis engine is able to diagnose faults with high accuracy and at a low overhead.