Measurement-based Analysis of Networked System Availability
Performance Evaluation: Origins and Directions
Building a System Failure Rate Estimator by Identifying Component Failure Rates
ISSRE '99 Proceedings of the 10th International Symposium on Software Reliability Engineering
Evaluation of Software Dependability Based on Stability Test Data
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Measurement of Failure Rate in Widely Distributed Software
FTCS '95 Proceedings of the Twenty-Fifth International Symposium on Fault-Tolerant Computing
Information Assurance: Dependability and Security in Networked Systems
Information Assurance: Dependability and Security in Networked Systems
FTCS'95 Proceedings of the Twenty-Fifth international conference on Fault-tolerant computing
Synthesizing a specification-based monitor for safety requirements
Iranian Journal of Science and Technology, Transaction B: Engineering
Hi-index | 0.00 |
This paper describes an analysis of hardware-related software (HW/SW) errors on an MVS/SP operating system at Stanford University. The analysis procedure demonstrates a methodology for evaluating the interaction between hardware and software as it relates to system reliability. The paper examines the operating system's handling of HW/SW errors and also the effectiveness of recovery management. Nearly 35 percent of all observed software failures were found to be hareware-related. The analysis shows that the operating system is seldom able to diagnose that a software error may be hardware-related. The impact of HW/SW errors on the system is evaluated by measuring the effectiveness of system recovery in containing the propagation of HW/SW errors. The system failure probability for HW/SW errors is close to three times that for software errors in general. The observed HW/SW errors are seen to have a specific pattern, suggesting the possibility of the use of such error patterns for intelligent error prediction and recovery.